You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to transcribe this audio file where the speaker says a sentence at the very beginning and then again at the very end (about 420 seconds).
I use the vad filter so that whisper doesn't hallucinate a bunch of Thank You's every 30 seconds or so of silence, but it's combining the first and last sentence into one chunk and giving me an output that looks like so:
3s -> 420s: Sentence1. Sentence2
I want it to look like this:
3s -> 4s: Sentence 1.
419s -> 420s: Sentence 2.
Any ideas on how to improve this?
I've tried using word_timestamps and hallucination_silence_threshold instead of the silence filter but that always leads to some kind of repetitive hallucinations on silence. I've also tried messing with the vad_parameters but I can't seem to get it to split.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm trying to transcribe this audio file where the speaker says a sentence at the very beginning and then again at the very end (about 420 seconds).
I use the vad filter so that whisper doesn't hallucinate a bunch of Thank You's every 30 seconds or so of silence, but it's combining the first and last sentence into one chunk and giving me an output that looks like so:
3s -> 420s: Sentence1. Sentence2
I want it to look like this:
3s -> 4s: Sentence 1.
419s -> 420s: Sentence 2.
Any ideas on how to improve this?
I've tried using word_timestamps and hallucination_silence_threshold instead of the silence filter but that always leads to some kind of repetitive hallucinations on silence. I've also tried messing with the vad_parameters but I can't seem to get it to split.
Beta Was this translation helpful? Give feedback.
All reactions