Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to disable duplicates in topk #464

Open
wants to merge 2 commits into
base: habana_main
Choose a base branch
from

Conversation

kdamaszk
Copy link

@kdamaszk kdamaszk commented Nov 6, 2024

Current implementation of optimized topp/topk calculations for scalar case is handling the duplicates that are outside of kth border. Unfortunately, to analyze duplicates it is necessary to make a synchronization with CPU, what makes multi-step scheduling useless together with topp/topk.

This PR adds option to skip duplicates handling with VLLM_HANDLE_TOPK_DUPLICATES (default True). When this variable is set, handling duplicates will be skipped and we will avoid synchronization with CPU. It also removes the synchronization which was done earlier in Sampler, by saving scalar value of top_k and top_p. It should give performance gain for all benchmarks with these sampling parameters, especially together with multi-step scheduling.

While disabling the duplicates handling may cause small accuracy differences, the best solution will be to handle duplicates without synchronization with CPU. However, this is not a trivial problem, so I will try to provide such solution later.

@kdamaszk kdamaszk marked this pull request as draft November 6, 2024 09:18
@kdamaszk kdamaszk force-pushed the dev/kdamaszke/topk-disable-duplicates branch from 0c0b46a to 6981936 Compare November 6, 2024 10:09
@kdamaszk kdamaszk force-pushed the dev/kdamaszke/topk-disable-duplicates branch from 6981936 to 695278c Compare November 6, 2024 14:06
@kdamaszk kdamaszk marked this pull request as ready for review November 6, 2024 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant