Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use simplified token counting method in case of the big files (#6014)
## Changes Tokenisation of huge files using tiktoken is very costly, and we could save a lot of CPU by simplifying it. Amount of tokens will always be greater than amount of words, so if amount of words exceeds `EXTENDED_USER_CONTEXT_TOKEN_BUDGET` we can just return it instead. For determining if file can be used as user context it will remain equally correct. For other purposes (if any) accuracy may suffer. ## Test plan 1. Build with the JetBrains plugin. 2. Open big file (few MB of text) 3. Select a text fragment and start moving mouse cursor while holding LMB pressed (so changing the selection). Without this changes CPU usage jumps to 100% and stays that way for a minute. With those changes it should drop back to single digit numbers in 2-3 seconds.
- Loading branch information