HQ LJSpeech voice. (finished, dl available) #202

StoryHack · 2023-09-07T17:22:15Z

StoryHack
Sep 7, 2023

As I mentioned in another topic, I'm currently training a voice from scratch on the high quality setting using the LJSpeech dataset, which is in the public domain. When I'm done, I'll release the .ckpt and .onnx files into the public domain, too.

I'm not sure how long I'll end up letting it train. For a while yet, at least. The training docks say to watch for when certain graphs in the tensorboard "level off". I don't know exactly what that means, but I'm watching those graphs. I don't have a particularly spectacular GPU, so it'll take a while.

If anybody is interested, I set up a page with some results as of epoch 286 this morning. TTS Voice Training Experiments

synesthesiam · 2023-09-07T18:00:04Z

synesthesiam
Sep 7, 2023
Maintainer

Thanks for doing this! I'd recommend running the English test phrases through as well (https://github.com/rhasspy/piper/tree/master/etc/test_sentences). The last 4 sentences are pangrams, which attempt to use every letter of the alphabet in a single sentence. They usually give a good hint of how the voice will perform on text outside the dataset.

Regarding the stopping criteria, every automated method I've found doesn't work when you have multiple optimizers. Since VITS has both the generator and discriminator loss, I'm not sure if there's a good (automated) way to say a voice is "done".

1 reply

StoryHack Sep 7, 2023
Author

Yep, I'm using the test_en-us.jsonl file from there to generate some of the test sentences on the page I linked to, the others are the ones automatically generated by the training script and visible on the tensorboard. They do feel like a better metric, and I'm antsy enough about the whole shebang to go in and generate the sentences a couple of times a day. How many samples did the Lessac voice use to train? That one ended up being about 2200 epochs, right?

StoryHack · 2023-09-14T22:55:19Z

StoryHack
Sep 14, 2023
Author

Ok, I just passed 1000 epochs, and updated the page with examples of what the voice is producing. Perhaps halfway there?

0 replies

StoryHack · 2023-09-26T16:57:46Z

StoryHack
Sep 26, 2023
Author

Ok, I finished training 2000 epochs, and here is what I ended up with. Ckpt and voice files are dedicated to the public domain. Samples are available here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HQ LJSpeech voice. (finished, dl available) #202

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

HQ LJSpeech voice. (finished, dl available) #202

StoryHack Sep 7, 2023

Replies: 3 comments · 1 reply

synesthesiam Sep 7, 2023 Maintainer

StoryHack Sep 7, 2023 Author

StoryHack Sep 14, 2023 Author

StoryHack Sep 26, 2023 Author

StoryHack
Sep 7, 2023

Replies: 3 comments 1 reply

synesthesiam
Sep 7, 2023
Maintainer

StoryHack Sep 7, 2023
Author

StoryHack
Sep 14, 2023
Author

StoryHack
Sep 26, 2023
Author