Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Consistency-Regularized CTC #1766

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

yaozengwei
Copy link
Collaborator

@yaozengwei yaozengwei commented Oct 8, 2024

This PR implements the Consistency-Regularized CTC (CR-CTC) in https://arxiv.org/pdf/2410.05101,
which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. It significantly improves the CTC performance. Please see paper for more details.

@yaozengwei
Copy link
Collaborator Author

On LibriSpeech dataset, results comparison with Zipformer, without using an external language model:

Model Params (M) test-clean test-other
CTC/AED, Zipformer-S 46.3 2.46 6.04
CTC/AED, Zipformer-M 90.0 2.22 4.97
CTC/AED, Zipformer-L 174.3 2.09 4.59
Pruned transducer, Zipformer-S 23.3 2.42 5.73
Pruned transducer, Zipformer-M 65.6 2.21 4.79
Pruned transducer, Zipformer-L 148.4 2.00 4.38
CTC, Zipformer-S 22.1 2.85 6.89
CTC, Zipformer-M 64.3 2.52 6.02
CTC, Zipformer-L 147.0 2.5 5.72
CR-CTC, Zipformer-S 22.1 2.52 5.85
CR-CTC, Zipformer-M 64.3 2.1 4.61
CR-CTC, Zipformer-L 147.0 2.02 4.35
CR-CTC/AED, Zipformer-L 174.3 1.96 4.08
Pruned transducer w/ CR-CTC, Zipformer-L 148.8 1.88 3.95

@csukuangfj
Copy link
Collaborator

Could you update RESULTS.md to include the URLs for the checkpoints and training logs of your PR?

@yaozengwei
Copy link
Collaborator Author

yaozengwei commented Oct 8, 2024

Could you update RESULTS.md to include the URLs for the checkpoints and training logs of your PR?

Sure. Will do it later.

@@ -950,7 +943,6 @@ def compute_loss(
spec_augment=spec_augment,
supervision_segments=supervision_segments,
time_warp_factor=params.spec_aug_time_warp_factor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can not find the definition of spec_aug_time_warp_factor

Copy link
Collaborator Author

@yaozengwei yaozengwei Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is defined in zipformer/asr_datamodule.py

@yaozengwei
Copy link
Collaborator Author

yaozengwei commented Oct 9, 2024

An example of training script using 4 * 32G-V100:

export CUDA_VISIBLE_DEVICES="0,1,2,3"
./zipformer/train.py \
  --world-size 4 \
  --num-epochs 50 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5 \
  --use-cr-ctc 1 \
  --use-ctc 1 \
  --use-transducer 0 \
  --use-attention-decoder 0 \
  --enable-spec-aug 0 \
  --cr-loss-scale 0.2 \
  --time-mask-ratio 2.5 \
  --full-libri 1 \
  --max-duration 700 \
  --master-port 12345

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants