prepare the video and audio data and extract feature for training procedure
mkdir -p examples/misp/kws/xtxt/data
1.2 prepare audio feature dir for audio model; we assume that all the audio have been in the feature format.
# the dir contains feats.scp and labels.scp
mkdir -p examples/misp/kws/xtxt/train
mkdir -p examples/misp/kws/xtxt/dev
# the dir contains feats.scp , labels.scp and video.scp
mkdir -p examples/misp/kws/xtxt/train_av
mkdir -p examples/misp/kws/xtxt/dev_av
4 kinds of models are offered now:
- conformer/transformer using only audio or video data
- fine tune model for conformer/transformer using focal-loss and label_smoothing
- audio-visual transformer model using both audio and video data by 2 kinds of fusion operation
- Majority Vote by all models
- run the following commands to start training audio transformer/conformer
python athena_wakeup/main.py examples/misp/kws/xtxt/configs/kws_audio_conformer.json
python athena_wakeup/main.py examples/misp/kws/xtxt/configs/kws_audio_transformer.json
- if you have multiple GPUs , you can train models parallel using the following commands
python athena_wakeup/horovod_main.py examples/misp/kws/xtxt/configs/kws_audio_conformer.json
python athena_wakeup/horovod_main.py examples/misp/kws/xtxt/configs/kws_audio_transformer.json
- the model will be stored in
examples/misp/kws/xtxt/ckpts/kws_audio_conformer
andexamples/misp/kws/xtxt/ckpts/kws_audio_transformer
- focal-loss wii be used to fine tune model to get improvements
python athena_wakeup/main.py examples/misp/kws/xtxt/configs/kws_audio_transformer_finuetune_ft.json
- train model using multi-moda data and the model will be stored in
examples/misp/kws/xtxt/ckpts/kws_av_transformer
python athena_wakeup/main.py examples/misp/kws/xtxt/configs/kws_av_transformer.json
- test the trained model and the FRR and FAR will be shown
python athena_wakeup/test_main.py examples/misp/kws/xtxt/configs/kws_audio_conformer.json
python athena_wakeup/test_main.py examples/misp/kws/xtxt/configs/kws_audio_transformer.json
- test the trained model
python athena_wakeup/test_main_av.py examples/misp/kws/xtxt/configs/kws_av_transformer.json
- As you have got audio transformer and audio-video transformer, you can use mode vote to get better results