Skip to content

Question about implementation of top-k sampling (5.3.2 Top-k sampling) #326

Answered by rasbt
labdmitriy asked this question in Q&A
Discussion options

You must be logged in to vote

Good call. I definitely could have used torch.inf. It's just muscle memory at this point because I believe it didn't exist in early versions of PyTorch (pre-2.0 or so).

Overall, I like your alternative implementation. Thanks for sharing that! I think the fact that it doesn't allow duplication can be seen as a pro or con.

In my implementation, if you have a top 3 setting and duplicates like in

[0.412314, 0.412314, -0.5, 0.1 0.2, 1.0, 0.8, ...] 

it will not strictly be top 3 anymore but top 3+, e.g.,

[0.412314, 0.412314, 1.0, 0.8, ...]

and the sampling will treat both tokens with equal probability. In your implementation, it will choose one over the other (I think the one that has the lower…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
1 reply
@rasbt
Comment options

Comment options

You must be logged in to vote
2 replies
@labdmitriy
Comment options

@rasbt
Comment options

Answer selected by labdmitriy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants