Skip to content

Version 1.4.0rc1

Latest
Compare
Choose a tag to compare
@huseinzol05 huseinzol05 released this 25 Mar 16:28
· 64 commits to master since this release
  1. Starting Malaya-Boilerplate 0.0.24, if Tensorflow absent in local, it will be replaced with Mock Tensorflow, https://malaya-speech.readthedocs.io/en/latest/mock-tensorflow.html, we are going to focus on PyTorch onwards.
  2. Added PyTorch RNNT using TorchAudio, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-pt.html, beat Google ASR on Malaya-Speech Malay test set, FLEURS Malay test set and Singlish test set. Required TorchAudio.
  3. Added PyTorch Multi-language RNNT using TorchAudio, now you can predict multi-language in 1 audio sample, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-pt-multilanguage.html, beat Google ASR on Malaya-Speech Malay test set, FLEURS Malay test set and Singlish test set. Required TorchAudio.
  4. Added more ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/stt-ctc-huggingface.html
  5. Added Finetuned Whisper models, trained on Malaya-Speech Malay train set and IMDA Singlish train set, https://malaya-speech.readthedocs.io/en/latest/stt-seq2seq-whisper.html
  6. Added HuggingFace ASR Seq2Seq models, https://malaya-speech.readthedocs.io/en/latest/stt-seq2seq-whisper.html
  7. Added Force Alignment using PyTorch RNNT, https://malaya-speech.readthedocs.io/en/latest/force-alignment-transducer-pt.html
  8. Added Force Alignment using HuggingFace ASR Seq2Seq models https://malaya-speech.readthedocs.io/en/latest/force-alignment-seq2seq-huggingface.html
  9. Added orkid, bunga, jebat, tuah, male, female speakers for TTS VITS, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html
  10. Added multispeaker TTS VITS, https://malaya-speech.readthedocs.io/en/latest/tts-vits-multispeaker.html
  11. Added is clean detection, very useful if you want to very clean voice activities, https://malaya-speech.readthedocs.io/en/latest/load-is-clean.html
  12. Added Speaker embedding models from Nemo, without required to install Nemo, https://malaya-speech.readthedocs.io/en/latest/load-speaker-vector-nemo.html, there are the best in term of EER score on VoxCeleb2 test set.
  13. Added interface to combine multiple diarization results become single diarization result, https://malaya-speech.readthedocs.io/en/latest/combine-longer-speaker-diarization.html
  14. Added TorchAudio streaming interface, streaming VAD, https://malaya-speech.readthedocs.io/en/latest/long-audio-vad-torchaudio.html
  15. Added TorchAudio streaming interface, streaming ASR, https://malaya-speech.readthedocs.io/en/latest/long-audio-asr-torchaudio.html
  16. Added Enformer Streaming PyTorch RNNT, https://malaya-speech.readthedocs.io/en/latest/long-audio-asr-torchaudio.html
  17. Added TorchAudio streaming interface, streaming ASR and diarization on Youtube videos, https://malaya-speech.readthedocs.io/en/latest/youtube-asr-diarization-torchaudio.html

To install it,

pip3 install malaya-speech==1.4.0rc1