Will multiple sequences be supported for LLamaCpp backend? #1022
alfie-nsugh
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
With something like this
the different generations can be seen as different sequences. With something like LLamaCpp, you can do parallel decoding by changing
batch.n_seq_id[j] = 1
.Will optimizations like this be supported at some point?
Beta Was this translation helpful? Give feedback.
All reactions