Only two speakers in an audio but pyannote assigned a new speaker for each segment. #1591
Hieroglyph17
started this conversation in
General
Replies: 2 comments 2 replies
-
Looks like you are using a VoiceActivityDetection pipeline while what you are looking for is a SpeakerDiarization pipeline. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Hi Hervé, I presume you mean https://huggingface.co/pyannote/speaker-diarization-3.1 Many thanks, Christoph |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I got a bunch of warnings when running but presume I can ignore them. However, the output assigns a new speaker for each segment. The audio file is a professional quality interview.
Can you help?
Christoph
Code:
pipeline = Pipeline.from_pretrained("config.yaml")
DEMO_FILE = {'uri': 'blabal', 'audio': '/Users/christophschnelle/Documents/Larry Sinclair Obama_02.wav'}
dz = pipeline(DEMO_FILE)
with open("diarization.txt", "w") as text_file:
text_file.write(str(dz))
print(*list(dz.itertracks(yield_label = True))[:100], sep="\n")
Output:
(...)
torchvision is not available - cannot save figures
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.2. To apply the upgrade to your files permanently, run
python -m pytorch_lightning.utilities.upgrade_checkpoint pytorch_model.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.2. Bad things might happen unless you revert torch to 1.x.
(<Segment(0.605802, 12.9096)>, 'A', 'SPEECH')
(<Segment(13.0973, 25.1109)>, 'B', 'SPEECH')
(<Segment(25.2645, 27.8413)>, 'C', 'SPEECH')
(<Segment(28.0802, 51.971)>, 'D', 'SPEECH')
(<Segment(53.3362, 59.9061)>, 'E', 'SPEECH')
(<Segment(61.0495, 69.0529)>, 'F', 'SPEECH')
(<Segment(70.4181, 77.7389)>, 'G', 'SPEECH')
(<Segment(78.4386, 86.5785)>, 'H', 'SPEECH')
(<Segment(87.1587, 89.2065)>, 'I', 'SPEECH')
(<Segment(89.3942, 98.9846)>, 'J', 'SPEECH')
(<Segment(99.4283, 102.21)>, 'K', 'SPEECH')
(<Segment(103.951, 126.766)>, 'L', 'SPEECH')
Beta Was this translation helpful? Give feedback.
All reactions