Progress on diarisation #320

mirix · 2023-06-22T08:54:49Z

mirix
Jun 22, 2023

Hi,

I am using faster-whisper for diarisation, if anyone wishes to have a look:

https://github.com/mirix/approaches-to-diarisation/tree/main

Best,

Ed

dgoryeo · 2023-06-23T18:57:44Z

dgoryeo
Jun 23, 2023

Hi @mirix , do I read the code right that the base assumption is that (faster_)Whisper has produces the right segments for all speakers? Am I right to think that (fatser_)Whipser might not have produced segments correctly, especially if/when cross talks or background noise? If that's the case, would it be more accurate if pyannote is used to detect the segments for audio, then pass those to Whipser transcriber? Would that make sense?

0 replies

mirix · 2023-06-27T11:34:35Z

mirix
Jun 27, 2023
Author

I have updated the update repo to provide samples that enable one to compare the standard WhisperX procedure with mine.

0 replies

mirix · 2023-06-27T11:47:42Z

mirix
Jun 27, 2023
Author

Hi @dgoryeo The issue you mention does indeed occur, but pyannote does not solve it. In fact, as most open source pipelines rely on pyannote, often pyannote is the underlying cause of it.

My impression is that the final goal of most of these tools is to be able to work with in real time with live streaming, so they tend to work sequentially.

I, on the other hand, need a tool that considers the entire track holistically.

That is what I am trying to develop.

The transcription and the synchronisation are typically fair enough.

The first problem is sentence splitting, as sometimes utterances from two different speakers are glued together even if there is no actual overlap. Even in cases where the voices are very different. It seems to happen more often with higher pitches. For instance when both speakers are females. But it can also happen with a female and a high pitch male or with two males.

So the voice features of each track need to be taken into account for parametrisation.

I am not sure what can be done about this.

When this happens, typically, diarisation does not work well. Sometimes it is a complete disaster.

But it may also happen that two different speakers are identified as being the same even when there is no overlap at all and the transcription splitting is perfect.

It is in this last scenario where my method seems to perform better. But it is very naive at this stage.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress on diarisation #320

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Progress on diarisation #320

mirix Jun 22, 2023

Replies: 3 comments

dgoryeo Jun 23, 2023

mirix Jun 27, 2023 Author

mirix Jun 27, 2023 Author

mirix
Jun 22, 2023

dgoryeo
Jun 23, 2023

mirix
Jun 27, 2023
Author

mirix
Jun 27, 2023
Author