Improve asr (whisper) with speaker diarization result #1144

yinruiqing · 2022-11-10T01:03:14Z

yinruiqing
Nov 10, 2022

I combined whisper and pyannote.audio in pyannote-whisper to get "who speaks when and what" from an audio file. The strategy is very simple: choose the major speaker in a sentence. The performance may degrade a lot for fast speech turns. Can we find a more elegant way to combine these tools? For example, do diarization first and then feed to whisper (this may also cause some problems in the overlap part).

hbredin · 2022-11-10T08:46:33Z

hbredin
Nov 10, 2022
Maintainer

I tried this here.

One way to improve it would be to merge adjacent speech turns of the same speaker so that Whisper can benefit from more context.

This does not solve the overlapping speech issue, however.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve asr (whisper) with speaker diarization result #1144

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Improve asr (whisper) with speaker diarization result #1144

yinruiqing Nov 10, 2022

Replies: 1 comment

hbredin Nov 10, 2022 Maintainer

yinruiqing
Nov 10, 2022

hbredin
Nov 10, 2022
Maintainer