Progress on diarisation #320
Replies: 3 comments
-
Hi @mirix , do I read the code right that the base assumption is that (faster_)Whisper has produces the right segments for all speakers? Am I right to think that (fatser_)Whipser might not have produced segments correctly, especially if/when cross talks or background noise? If that's the case, would it be more accurate if pyannote is used to detect the segments for audio, then pass those to Whipser transcriber? Would that make sense? |
Beta Was this translation helpful? Give feedback.
-
I have updated the update repo to provide samples that enable one to compare the standard WhisperX procedure with mine. |
Beta Was this translation helpful? Give feedback.
-
Hi @dgoryeo The issue you mention does indeed occur, but pyannote does not solve it. In fact, as most open source pipelines rely on pyannote, often pyannote is the underlying cause of it. My impression is that the final goal of most of these tools is to be able to work with in real time with live streaming, so they tend to work sequentially. I, on the other hand, need a tool that considers the entire track holistically. That is what I am trying to develop. The transcription and the synchronisation are typically fair enough. The first problem is sentence splitting, as sometimes utterances from two different speakers are glued together even if there is no actual overlap. Even in cases where the voices are very different. It seems to happen more often with higher pitches. For instance when both speakers are females. But it can also happen with a female and a high pitch male or with two males. So the voice features of each track need to be taken into account for parametrisation. I am not sure what can be done about this. When this happens, typically, diarisation does not work well. Sometimes it is a complete disaster. But it may also happen that two different speakers are identified as being the same even when there is no overlap at all and the transcription splitting is perfect. It is in this last scenario where my method seems to perform better. But it is very naive at this stage. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am using faster-whisper for diarisation, if anyone wishes to have a look:
https://github.com/mirix/approaches-to-diarisation/tree/main
Best,
Ed
Beta Was this translation helpful? Give feedback.
All reactions