Prosody Control #983

Patchethium · 2022-10-16T06:18:33Z

Patchethium
Oct 16, 2022

Hey I'm here just to share one of my knowledge I got from browsing the papers, that besides phoneme level control (pitch, duration, energy, etc.), modern TTS pipelines in industry(arXiv:2110.12612) use both utterance level and word level prosody style tokens, which derive from the global style token(arXiv:1803.09017).
That system made by Microsoft got the first prize in the Blizzard Challenge 2021 (actually, almost every system prompted for the challenge uses style tokens). Whatever acoustic model you're using, you should definitely check it out.

Hiroshiba · 2022-10-16T12:27:18Z

Hiroshiba
Oct 16, 2022
Maintainer

These papers often require a lot of speech data and are not very useful in cases like VOICEVOX ;-)
By the way, DelightfulTTS 2 was published after DelightfulTTS.
https://arxiv.org/abs/2207.04646

これらの論文は多くの音声データを必要とすることが多く、VOICEVOXのような場合にはあまり使えないことが多いです ;-)
ちなみにDelightfulTTSの後にDelightfulTTS 2が公開されています。
https://arxiv.org/abs/2207.04646

0 replies

Patchethium · 2022-10-16T19:47:36Z

Patchethium
Oct 16, 2022
Author

Awesome, thanks about the paper reference, pretty helpful. So you already know about the style tokens, they can do a lot of things except prosody control, like for present VOICEVOX speakers, they can help achieve a continuous control over styles, like 50% of あまあま and 30% of つん. There should be a reason you're not using them currently, maybe the low resource problem?

These papers often require a lot of speech data and are not very useful in cases like VOICEVOX

Have you actually tried this out in like some experiments? How about some data augmentation or pre-training? I mean I also wonder if this would actually work in low resource speakers.

1 reply

Hiroshiba Oct 17, 2022
Maintainer

Have you actually tried this out in like some experiments?

I have tried something similar, but it didn't work ;-)

Have you actually tried this out in like some experiments?

手法は違いますが、似たようなことを試したことがありますが、うまくいきませんでした ;-)

Patchethium · 2022-10-17T15:14:08Z

Patchethium
Oct 17, 2022
Author

Sad 😥, thanks for sharing!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prosody Control #983

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Prosody Control #983

Patchethium Oct 16, 2022

Replies: 3 comments · 1 reply

Hiroshiba Oct 16, 2022 Maintainer

Patchethium Oct 16, 2022 Author

Hiroshiba Oct 17, 2022 Maintainer

Patchethium Oct 17, 2022 Author

Patchethium
Oct 16, 2022

Replies: 3 comments 1 reply

Hiroshiba
Oct 16, 2022
Maintainer

Patchethium
Oct 16, 2022
Author

Hiroshiba Oct 17, 2022
Maintainer

Patchethium
Oct 17, 2022
Author