Questions about the use of the training Colab notebook #308

MooTheCow · 2023-12-12T16:38:52Z

MooTheCow
Dec 12, 2023

PiperTTS is fantastic. Well done!

I have a few questions about the training colab notebook.

What determines model size?
I am finetuning off of lessacs-high. Does this determine the model size? Then I am selecting Quality for the model as high. If I select something else will this drop the quality to that level? And finally I am exporting the model with the Exporter notebook and I can choose "quality" there. Does that change the model size or is it just used to create the name of the model? Ideally I'd like to train a High model, but then convert that quickly into a Medium model to compare render time and fidelity. I don't really want to have to go through the entire training process again for a Medium if I don't need to. And I'm not sure if I then have to train from a Medium model instead of a High one.
How long to train for?
There is no information given by the notebook other than a log message:
DEBUG:fsspec.local:open file: /content/drive/MyDrive/colab/piper/Test/lightning_logs/version_0/checkpoints/last.ckpt
There is no indication of how far the training has progressed, which makes it hard to judge how long to leave it. I've had success finetuning and training 100 * 20sec wavs for 3.5 hours, but I don't know if I could make the model better with longer, or if that was overkill. Colab starts to get ansty at that time and wants to kick me off. And I've totally failed to get WSL to recognise CUDA so far, so local training is not possible.
If Colab boots the process over taking too long, can I use the last.ckpt (eg with the Extraction notebook) or is that now invalid?
Other information and community?
I came here from CoquiTTS, which has a useful Discord, and is a great model, but in my opinion not as good as Piper. Is there anywhere other than this github (or, worse: comments on YouTube videos) to discuss PiperTTS? Also is there anywhere that people are sharing voice models, other than links to random HuggingFace pages?

rmcpantoja · 2023-12-13T01:05:52Z

rmcpantoja
Dec 13, 2023

Hi @MooTheCow,

Yes, the quality combo box in the exporter notebook is only used to name the model. What really determines the quality of the model is in the training part. If you choose high quality in the combo box, then you will have to choose a high quality pretrained model as well. The qualities can determine a different sampling rate, efficiency and performance, as you can see in the training notebook about quality parameter. For example, a model with medium quality is good for screen reader users, since by having a little less params, it's capable of generating speech in less time, which can allow real-time interaction with the screen reader and the user. On the other hand, a model with high quality usually takes longer to generate results, but these sounds better. If you want something of quality, choosing the high one is ideal.
The model training time really depends on the dataset. To update you, these last few days I made important updates to the training notebook and, in fact, one of them is what you mention, the display of training progress and an important fix in which the audios on the tensorboard were not able to be displayed even though were generated and the size of the log grew a lot. The notebooks in this repository need to be updated to those in my fork, so I made a pull request, but you can open the links to the notebooks from this fork as an alternative. Link.

If they are very small datasets (less than 100 wavs), 2-3 hours could work well. If they are older, I recommend taking advantage of the six free hours that colab offers with GPU, although lately the colab drive has the bug of disconnecting Drive after 4 hours of execution.
3. Can you be more specific please? If you want to continue training an existing model, you have the corresponding option in step 4. If you want to export the model, even though colab disconnected you, you can use last.ckpt to do so. In case the model is corrupt and you receive a torch error, you can choose the previous version of your last.ckpt from Drive, right click>manage versions.
4. We currently have only one Discord group that addresses many aspects of the TTS field in general, and piper is among them. Although, having already reached the limit of ten people, there was the idea of creating a server, which could happen in the future. Meanwhile, there are models and voices in huggingFace and some in GitHub forks. I also have a piperTTS Discord bot with many voices.

2 replies

MooTheCow Dec 14, 2023
Author

Thanks for this in-depth reply!
I will retrain my dataset on a Medium model to compare latency and audio fidelity.

iSuslov Mar 31, 2024

Hey @MooTheCow, do you mind to share your overall experience as for now?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the use of the training Colab notebook #308

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Questions about the use of the training Colab notebook #308

MooTheCow Dec 12, 2023

Replies: 1 comment · 2 replies

rmcpantoja Dec 13, 2023

MooTheCow Dec 14, 2023 Author

iSuslov Mar 31, 2024

MooTheCow
Dec 12, 2023

Replies: 1 comment 2 replies

rmcpantoja
Dec 13, 2023

MooTheCow Dec 14, 2023
Author