Replies: 3 comments 3 replies
-
from the article it's for not sure if whisper pipeline is supported |
Beta Was this translation helpful? Give feedback.
0 replies
-
Their demo just shows that model - but the assistant_model arg is available on any transformer model.generate() call. For example:
|
Beta Was this translation helpful? Give feedback.
3 replies
-
ok so it's something more deep down in c++ code in ctranslate2, u could open a feature request there but not sure if guillaume has time to implement it since he just got new position |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm wondering if anyone has experience trying to combine the benefits of assisted decoding with the benefits of quantization we get in faster-whisper? Ive done some experimenting with the hugging face implementation and it does give nice latency improvements, but even with assisted decoding I find its still slower than using faster-whisper. It would be quite nice to have both combined.
Wondering if there's a good approach to doing this already? I looked into exporting a checkpoint from hugging face with assisted decoding included, but it seems the assisted decoding is a decoding strategy, so not part of the actual model that gets exported, so therefore it wont be included when we run ctranslate2.
Beta Was this translation helpful? Give feedback.
All reactions