For non-english ASR, it is best to use the large
whisper model. Alignment models are automatically picked by the chosen language from the default lists.
Currently support default models tested for {en, fr, de, es, it, ja, zh, nl}
If the detected language is not in this list, you need to find a phoneme-based ASR model from huggingface model hub and test it on your data.
whisperx --model large --language fr examples/sample_fr_01.wav
sample_fr_01_vis.mov
whisperx --model large --language de examples/sample_de_01.wav
sample_de_01_vis.mov
whisperx --model large --language de examples/sample_it_01.wav
sample_it_01_vis.mov
whisperx --model large --language ja examples/sample_ja_01.wav