Speaker identification with the diarization #1589

Bennoo · 2023-12-14T14:22:28Z

Bennoo
Dec 14, 2023

Hello, I would like to know if this is possible to create a collection of voices, and when pyannote is doing the diarization he will compare it with the collection, and if some is matching he will identify the speaker with that ID.
Should be great when doing transcription with diarization.

hbredin · 2023-12-14T14:54:24Z

hbredin
Dec 14, 2023
Maintainer

Not out of the box -- but this option should get you closer to this goal.

0 replies

deanm0000 · 2024-01-12T00:35:25Z

deanm0000
Jan 12, 2024

@hbredin am I correct that the option you pointed to just let's us change the label of a completed analysis? I'm not sure if this was OP's intention so I'd like to ask a clarifying question.

Suppose today we have a 2 hour file that we want to analyze but we expect to have another 2 hour file next week with many of the same voices. Is there a way to retain the (I don't know the right term) signature of each voice this week so that next week when we get and analyze the new file, the labels are the same?

In the absence of a direct way to do this, would the best way be to concat the second file to the first one and then analyze 4 hours in the second week?

0 replies

PhilipAmadasun · 2024-03-04T02:19:21Z

PhilipAmadasun
Mar 4, 2024

@Bennoo I've been trying to figure out how long each clip of collected voices has to be that I wish to save in a collection. Or does that even matter? How long is each clip your recorded in your collection? What I'm doing is extracting the embedding from a voice I saved in the collection and then comparing it with the embedding of the voice I want to identify (using cosine similarity)

0 replies

Bennoo · 2024-03-04T08:31:42Z

Bennoo
Mar 4, 2024
Author

@PhilipAmadasun That's exactly what I had in mind. Being able to save all embeddings in a database and compare a new clip's speakers with the whole database in addition to the new embeddings from the clip.
Imagine you can do that with conf call recordings from the same company (or podcast, or TV show,..), with an embedding vector database with the company's employees voices. Then you will be able to identify who is speaking.
I was originally asking if it is possible to do with pyannote.

0 replies

PhilipAmadasun · 2024-03-04T19:42:28Z

PhilipAmadasun
Mar 4, 2024

@Bennoo You can't do it off the bat, but with some extra coding you can pull it off. I don't mean that you have to modify the pyannote source code, I just mean you can use pyannote creatively in your code to do what you need. The collection of embeddings can be stored in pickle file for easy look up I think. From my tests it's okay at identifying the voices people stored in my collection. The problem comes when the voice belongs to an unknown speaker. I don't know what threshold to use for cosine similarity. If a strangers voice compared against Johns voice results a cosine similarity of 0.7, Should the threshold be 0.6 so the program does not count it as johns voice? But then someone else that is not Sarah will have 0.58 similarity with Sarah. It's all very arbitrary to me I don't know. That's why I'm asking how long each voice clip is, which might help me.

0 replies

PhilipAmadasun · 2024-03-04T19:43:51Z

PhilipAmadasun
Mar 4, 2024

@Bennoo This is a link to a question I asked that relates to this

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speaker identification with the diarization #1589

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Speaker identification with the diarization #1589

Bennoo Dec 14, 2023

Replies: 6 comments

hbredin Dec 14, 2023 Maintainer

deanm0000 Jan 12, 2024

PhilipAmadasun Mar 4, 2024

Bennoo Mar 4, 2024 Author

PhilipAmadasun Mar 4, 2024

PhilipAmadasun Mar 4, 2024

Bennoo
Dec 14, 2023

hbredin
Dec 14, 2023
Maintainer

deanm0000
Jan 12, 2024

PhilipAmadasun
Mar 4, 2024

Bennoo
Mar 4, 2024
Author

PhilipAmadasun
Mar 4, 2024

PhilipAmadasun
Mar 4, 2024