Assisted decoding with faster-whisper #524

ostegm · 2023-10-24T17:46:52Z

ostegm
Oct 24, 2023

I'm wondering if anyone has experience trying to combine the benefits of assisted decoding with the benefits of quantization we get in faster-whisper? Ive done some experimenting with the hugging face implementation and it does give nice latency improvements, but even with assisted decoding I find its still slower than using faster-whisper. It would be quite nice to have both combined.

Wondering if there's a good approach to doing this already? I looked into exporting a checkpoint from hugging face with assisted decoding included, but it seems the assisted decoding is a decoding strategy, so not part of the actual model that gets exported, so therefore it wont be included when we run ctranslate2.

phineas-pta · 2023-10-24T18:19:13Z

phineas-pta
Oct 24, 2023

from the article it's for AutoModelForCausalLM

not sure if whisper pipeline is supported

0 replies

ostegm · 2023-10-24T18:24:19Z

ostegm
Oct 24, 2023
Author

Their demo just shows that model - but the assistant_model arg is available on any transformer model.generate() call. For example:

wav_file = "sample.wav"
audio, _ = librosa.load(wav_file, sr=16000, dtype='float32')
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2").to(device)
assistant_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny").to(device)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features.to(device)
predicted_ids = model.generate(input_features, assistant_model=assistant_model)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

3 replies

phineas-pta Oct 24, 2023

u can pass a lot of things into generate(), no error doesnt mean it's supported, it can be a placeholder or simply ignored

ostegm Oct 24, 2023
Author

I'm not sure if you read the blog post, but the blog post includes benchmarks for whisper. Here's the code which he used for the blog post.

If you still aren't convinced, run the code yourself with and without an assistant model.

phineas-pta Oct 24, 2023

sr my bad i didnt see the benchmark

phineas-pta · 2023-10-24T21:12:36Z

phineas-pta
Oct 24, 2023

ok so it's something more deep down in c++ code in ctranslate2, u could open a feature request there but not sure if guillaume has time to implement it since he just got new position

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assisted decoding with faster-whisper #524

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Assisted decoding with faster-whisper #524

ostegm Oct 24, 2023

Replies: 3 comments · 3 replies

phineas-pta Oct 24, 2023

ostegm Oct 24, 2023 Author

phineas-pta Oct 24, 2023

ostegm Oct 24, 2023 Author

phineas-pta Oct 24, 2023

phineas-pta Oct 24, 2023

ostegm
Oct 24, 2023

Replies: 3 comments 3 replies

phineas-pta
Oct 24, 2023

ostegm
Oct 24, 2023
Author

ostegm Oct 24, 2023
Author

phineas-pta
Oct 24, 2023