Improve asr (whisper) with speaker diarization result #1144
yinruiqing
started this conversation in
Ideas
Replies: 1 comment
-
I tried this here. One way to improve it would be to merge adjacent speech turns of the same speaker so that Whisper can benefit from more context. This does not solve the overlapping speech issue, however. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I combined whisper and pyannote.audio in pyannote-whisper to get "who speaks when and what" from an audio file. The strategy is very simple: choose the major speaker in a sentence. The performance may degrade a lot for fast speech turns. Can we find a more elegant way to combine these tools? For example, do diarization first and then feed to whisper (this may also cause some problems in the overlap part).
Beta Was this translation helpful? Give feedback.
All reactions