Replies: 6 comments
-
Not out of the box -- but this option should get you closer to this goal. |
Beta Was this translation helpful? Give feedback.
-
@hbredin am I correct that the option you pointed to just let's us change the label of a completed analysis? I'm not sure if this was OP's intention so I'd like to ask a clarifying question. Suppose today we have a 2 hour file that we want to analyze but we expect to have another 2 hour file next week with many of the same voices. Is there a way to retain the (I don't know the right term) signature of each voice this week so that next week when we get and analyze the new file, the labels are the same? In the absence of a direct way to do this, would the best way be to concat the second file to the first one and then analyze 4 hours in the second week? |
Beta Was this translation helpful? Give feedback.
-
@Bennoo I've been trying to figure out how long each clip of collected voices has to be that I wish to save in a collection. Or does that even matter? How long is each clip your recorded in your collection? What I'm doing is extracting the embedding from a voice I saved in the collection and then comparing it with the embedding of the voice I want to identify (using cosine similarity) |
Beta Was this translation helpful? Give feedback.
-
@PhilipAmadasun That's exactly what I had in mind. Being able to save all embeddings in a database and compare a new clip's speakers with the whole database in addition to the new embeddings from the clip. |
Beta Was this translation helpful? Give feedback.
-
@Bennoo You can't do it off the bat, but with some extra coding you can pull it off. I don't mean that you have to modify the pyannote source code, I just mean you can use pyannote creatively in your code to do what you need. The collection of embeddings can be stored in pickle file for easy look up I think. From my tests it's okay at identifying the voices people stored in my collection. The problem comes when the voice belongs to an unknown speaker. I don't know what threshold to use for cosine similarity. If a strangers voice compared against Johns voice results a cosine similarity of 0.7, Should the threshold be 0.6 so the program does not count it as johns voice? But then someone else that is not Sarah will have 0.58 similarity with Sarah. It's all very arbitrary to me I don't know. That's why I'm asking how long each voice clip is, which might help me. |
Beta Was this translation helpful? Give feedback.
-
@Bennoo This is a link to a question I asked that relates to this |
Beta Was this translation helpful? Give feedback.
-
Hello, I would like to know if this is possible to create a collection of voices, and when pyannote is doing the diarization he will compare it with the collection, and if some is matching he will identify the speaker with that ID.
Should be great when doing transcription with diarization.
Beta Was this translation helpful? Give feedback.
All reactions