-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Performance improvements on RAG analyzer for long text #2159
Labels
bug
Something isn't working
Comments
yingfeng
added a commit
that referenced
this issue
Nov 5, 2024
### What problem does this PR solve? - Split long text into short ones. Further improvements are required for smarter splitting. - Fix deadlock of memory_indexer during offline building. Issue link:#2159 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Performance Improvement
yingfeng
changed the title
[Bug]: Performance improvements on RAG analyzer for long text
[Feature Request]: Performance improvements on RAG analyzer for long text
Nov 13, 2024
yingfeng
changed the title
[Feature Request]: Performance improvements on RAG analyzer for long text
[Bug]: Performance improvements on RAG analyzer for long text
Nov 13, 2024
yingfeng
pushed a commit
that referenced
this issue
Nov 15, 2024
### What problem does this PR solve? * Add `RAGAnalyzer::GetBestTokens` function It is now only for test in Debug mode It differs with dfs in some cases * Fix some existing bugs Issue link:#2159 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring - [x] Test cases
yingfeng
pushed a commit
that referenced
this issue
Nov 15, 2024
### What problem does this PR solve? Support a new score function Support get topn result by dp Issue link:#2159 ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring
yingfeng
pushed a commit
that referenced
this issue
Nov 20, 2024
### What problem does this PR solve? Improve performance of `RAGAnalyzer::GetBestTokensTopN`, reduce memory cost Issue link:#2159 ### Type of change - [x] Refactoring - [x] Performance Improvement
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there an existing issue for the same bug?
Version or Commit ID
f9ef948
Other environment information
No response
Actual behavior and How to reproduce it
Related with #2147, the OOM issue has been fixed by #2154
The performance should be also improved. The following long text will take more than 80s for tokenization.
Expected behavior
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: