Add option to disable duplicates in topk #464

kdamaszk · 2024-11-06T08:51:59Z

Current implementation of optimized topp/topk calculations for scalar case is handling the duplicates that are outside of kth border. Unfortunately, to analyze duplicates it is necessary to make a synchronization with CPU, what makes multi-step scheduling useless together with topp/topk.

This PR adds option to skip duplicates handling with VLLM_HANDLE_TOPK_DUPLICATES (default True). When this variable is set, handling duplicates will be skipped and we will avoid synchronization with CPU. It also removes the synchronization which was done earlier in Sampler, by saving scalar value of top_k and top_p. It should give performance gain for all benchmarks with these sampling parameters, especially together with multi-step scheduling.

While disabling the duplicates handling may cause small accuracy differences, the best solution will be to handle duplicates without synchronization with CPU. However, this is not a trivial problem, so I will try to provide such solution later.

kdamaszk marked this pull request as draft November 6, 2024 09:18

kdamaszk force-pushed the dev/kdamaszke/topk-disable-duplicates branch from 0c0b46a to 6981936 Compare November 6, 2024 10:09

kdamaszk added 2 commits November 6, 2024 16:06

Add option to disable duplicates in topk

e4d6ab6

Use top_p as a scalar

695278c

kdamaszk force-pushed the dev/kdamaszke/topk-disable-duplicates branch from 6981936 to 695278c Compare November 6, 2024 14:06

kdamaszk marked this pull request as ready for review November 6, 2024 14:24

kdamaszk requested review from michalkuligowski and kzawora-intel November 6, 2024 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to disable duplicates in topk #464

Add option to disable duplicates in topk #464

kdamaszk commented Nov 6, 2024 •

edited

Loading

Add option to disable duplicates in topk #464

Are you sure you want to change the base?

Add option to disable duplicates in topk #464

Conversation

kdamaszk commented Nov 6, 2024 • edited Loading

kdamaszk commented Nov 6, 2024 •

edited

Loading