Use correct features padding for encoder input #1101

MahmoudAshraf97 · 2024-10-28T18:55:47Z

This adheres faster whisper implementation to the original OpenAI implementation as discussed in #1084

These are WER comparisons before and after
These figures can be reproduced by running benchmarks/yt_commons.py and switching the batched inference to sequential

word_timestamps=False,
without_timestamps=True,
vad_filter=True,

Model	Before WER	After WER
distil-large-v3	26.277	14.762
distil-large-v2	71.848	81.456
distil-medium.en	68.044	66.565
distil-small.en	68.719	67.220

the performance regression of distil-large-v2 can be ignored because distil-large-v3 should be used as a drop-in replacement

This should also affect all use cases where chunk length is less than 30 for all models
There is also an average WER improvement by around 2% (relative) for batched inference across all models

Model	Before WER	After WER
tiny.en	15.437	15.063
tiny	21.765	21.390
base.en	14.300	13.816
base	17.709	17.251
small.en	13.054	12.617
small	16.413	16.088
medium.en	13.299	12.894
medium	15.991	15.593
large-v1	19.458	19.590
large-v2	15.237	15.148
large-v3	16.514	15.997
large-v3-turbo	14.576	14.044
distil-small.en	14.004	13.918
distil-medium.en	14.074	13.972
distil-large-v2	13.419	13.574
distil-large-v3	13.688	13.533

MahmoudAshraf97 · 2024-11-07T10:30:26Z

This PR disabled the ability to change the encoder input and output dimensions (audio_ctx), any chunk_length passed to the transcribe will be respected, but the encoder input will be padded to 30s equivalent regardless of the chunk length

#171 (comment)

pad to 3000 instead of feature_extractor.nb_max_frames

ee935f6

MahmoudAshraf97 mentioned this pull request Oct 28, 2024

Pad audio instead of mel features to reduce word error rates #1084

Closed

correct trimming for batched features

c426699

MahmoudAshraf97 changed the title ~~Improve WER of distil models~~ Use correct features padding for encoder input Oct 29, 2024

MahmoudAshraf97 merged commit 2386843 into SYSTRAN:master Oct 29, 2024
3 checks passed

MahmoudAshraf97 deleted the fix_features_padding branch October 30, 2024 11:54

MahmoudAshraf97 mentioned this pull request Nov 2, 2024

[Hallucinations] Repetition of words or chunks with own fine-tuned model #987

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use correct features padding for encoder input #1101

Use correct features padding for encoder input #1101

MahmoudAshraf97 commented Oct 28, 2024 •

edited

Loading

MahmoudAshraf97 commented Nov 7, 2024

Use correct features padding for encoder input #1101

Use correct features padding for encoder input #1101

Conversation

MahmoudAshraf97 commented Oct 28, 2024 • edited Loading

MahmoudAshraf97 commented Nov 7, 2024

MahmoudAshraf97 commented Oct 28, 2024 •

edited

Loading