This is a model built based on modelscope kantts/a finetune version of korean tts
Use modify_sr to change the sampling rate to 16000hz
Use korean_recognition to auto recognize korean
Use text2phone to change text to required form
cp the raw_metafile into the labeled file
python kantts/preprocess/data_process_korean.py --voice_input_dir data/hanyu2h_label --voice_output_dir data/hanyu2h_labeled --audio_config kantts/configs/audio_config_se_16k.yaml --speaker F7 --se_model speech_personal_sambert-hifigan_nsf_tts_zh-cn_pretrain_16k/basemodel_16k/speaker_embedding/se.model to extract the relevant features (need to edit the data_process_korean.py 113 the raw_metafile location and the audio_processor 717 the raw folder)
edit the am_train file using raw_metafile
CUDA_VISIBLE_DEVICES=1,2 python kantts/bin/train_sambert.py --model_config speech_sambert-hifigan_tts_kyong_Korean_16k/basemodel_16k/sambert/config.yaml --root_dir data/hanyu_label data/tts1_label data/kss_label --stage_dir training_stage/test_male_ptts_sambert_ckpt --resume_path speech_sambert-hifigan_tts_kyong_Korean_16k/basemodel_16k/sambert/ckpt/checkpoint_630000.pth
Preprocess korean dataset as shown in training process for sambert, run the following command to train hifigan for your own voice
CUDA_VISIBLE_DEVICES=2 python kantts/bin/train_hifigan.py --model_config speech_sambert-hifigan_tts_kyong_Korean_16k/basemodel_16k/hifigan/config.yaml --root_dir data/korean_labeled --stage_dir training_stage_hifigan --resume_path speech_sambert-hifigan_tts_kyong_Korean_16k/basemodel_16k/hifigan/ckpt/checkpoint_160000.pth
You can download the pretrain korean model from modelscope (https://modelscope.cn/models/speech_tts/speech_sambert-hifigan_tts_kyong_Korean_16k/files)
Then run the following command:
CUDA_VISIBLE_DEVICES=0 python kantts/bin/text_to_wav.py --txt test.txt --output_dir res/test_male_ptts_syn --res_zip speech_sambert-hifigan_tts_kyong_Korean_16k/resource.zip --am_ckpt speech_sambert-hifigan_tts_kyong_Korean_16k/basemodel_16k/sambert/ckpt/checkpoint_630000.pth --voc_ckpt speech_sambert-hifigan_tts_kyong_Korean_16k/basemodel_16k/hifigan/ckpt/checkpoint_160000.pth