Replies: 2 comments 2 replies
-
This would need a bit of work on your side but that is possible, yes.
|
Beta Was this translation helpful? Give feedback.
1 reply
-
I use a frame level xvetor embedding instead of SincNet. In my speaker change detection testset, it works better. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was wondering is that possible to use xvector instead of raw audio as the segmentation model's input?
This may make the model more robust?
Beta Was this translation helpful? Give feedback.
All reactions