-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Consistency-Regularized CTC #1766
base: master
Are you sure you want to change the base?
Conversation
On LibriSpeech dataset, results comparison with Zipformer, without using an external language model:
|
Could you update RESULTS.md to include the URLs for the checkpoints and training logs of your PR? |
Sure. Will do it later. |
@@ -950,7 +943,6 @@ def compute_loss( | |||
spec_augment=spec_augment, | |||
supervision_segments=supervision_segments, | |||
time_warp_factor=params.spec_aug_time_warp_factor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can not find the definition of spec_aug_time_warp_factor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is defined in zipformer/asr_datamodule.py
An example of training script using 4 * 32G-V100: export CUDA_VISIBLE_DEVICES="0,1,2,3"
./zipformer/train.py \
--world-size 4 \
--num-epochs 50 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5 \
--use-cr-ctc 1 \
--use-ctc 1 \
--use-transducer 0 \
--use-attention-decoder 0 \
--enable-spec-aug 0 \
--cr-loss-scale 0.2 \
--time-mask-ratio 2.5 \
--full-libri 1 \
--max-duration 700 \
--master-port 12345 |
This PR implements the Consistency-Regularized CTC (CR-CTC) in https://arxiv.org/pdf/2410.05101,
which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. It significantly improves the CTC performance. Please see paper for more details.