Add Consistency-Regularized CTC #1766

yaozengwei · 2024-10-08T02:31:05Z

This PR implements the Consistency-Regularized CTC (CR-CTC) in https://arxiv.org/pdf/2410.05101,
which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. It significantly improves the CTC performance. Please see paper for more details.

yaozengwei · 2024-10-08T02:47:42Z

On LibriSpeech dataset, results comparison with Zipformer, without using an external language model:

Model	Params (M)	test-clean	test-other
CTC/AED, Zipformer-S	46.3	2.46	6.04
CTC/AED, Zipformer-M	90.0	2.22	4.97
CTC/AED, Zipformer-L	174.3	2.09	4.59
Pruned transducer, Zipformer-S	23.3	2.42	5.73
Pruned transducer, Zipformer-M	65.6	2.21	4.79
Pruned transducer, Zipformer-L	148.4	2.00	4.38
CTC, Zipformer-S	22.1	2.85	6.89
CTC, Zipformer-M	64.3	2.52	6.02
CTC, Zipformer-L	147.0	2.5	5.72
CR-CTC, Zipformer-S	22.1	2.52	5.85
CR-CTC, Zipformer-M	64.3	2.1	4.61
CR-CTC, Zipformer-L	147.0	2.02	4.35
CR-CTC/AED, Zipformer-L	174.3	1.96	4.08
Pruned transducer w/ CR-CTC, Zipformer-L	148.8	1.88	3.95

csukuangfj · 2024-10-08T02:49:52Z

Could you update RESULTS.md to include the URLs for the checkpoints and training logs of your PR?

yaozengwei · 2024-10-08T02:51:22Z

Could you update RESULTS.md to include the URLs for the checkpoints and training logs of your PR?

Sure. Will do it later.

kobenaxie · 2024-10-08T03:49:58Z

egs/librispeech/ASR/zipformer/train.py

@@ -950,7 +943,6 @@ def compute_loss(
            spec_augment=spec_augment,
            supervision_segments=supervision_segments,
            time_warp_factor=params.spec_aug_time_warp_factor,


can not find the definition of spec_aug_time_warp_factor

It is defined in zipformer/asr_datamodule.py

yaozengwei · 2024-10-09T13:24:54Z

An example of training script using 4 * 32G-V100:

export CUDA_VISIBLE_DEVICES="0,1,2,3"
./zipformer/train.py \
  --world-size 4 \
  --num-epochs 50 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5 \
  --use-cr-ctc 1 \
  --use-ctc 1 \
  --use-transducer 0 \
  --use-attention-decoder 0 \
  --enable-spec-aug 0 \
  --cr-loss-scale 0.2 \
  --time-mask-ratio 2.5 \
  --full-libri 1 \
  --max-duration 700 \
  --master-port 12345

yaozengwei added 5 commits September 4, 2024 14:27

support consistency-regularized CTC

ebbbcbc

update arguments of cr-ctc

07d6b12

set default value of cr_loss_masked_scale to 1.0

cf796ee

minor fix

a6eead6

refactor codes

ae59e5d

kobenaxie reviewed Oct 8, 2024

View reviewed changes

yaozengwei mentioned this pull request Oct 10, 2024

[Not for merge] Add Smooth-Regularized CTC #1769

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Consistency-Regularized CTC #1766

Add Consistency-Regularized CTC #1766

yaozengwei commented Oct 8, 2024 •

edited

Loading

yaozengwei commented Oct 8, 2024

csukuangfj commented Oct 8, 2024

yaozengwei commented Oct 8, 2024 •

edited

Loading

kobenaxie Oct 8, 2024

yaozengwei Oct 8, 2024 •

edited

Loading

yaozengwei commented Oct 9, 2024 •

edited

Loading

Add Consistency-Regularized CTC #1766

Are you sure you want to change the base?

Add Consistency-Regularized CTC #1766

Conversation

yaozengwei commented Oct 8, 2024 • edited Loading

yaozengwei commented Oct 8, 2024

csukuangfj commented Oct 8, 2024

yaozengwei commented Oct 8, 2024 • edited Loading

kobenaxie Oct 8, 2024

Choose a reason for hiding this comment

yaozengwei Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

yaozengwei commented Oct 9, 2024 • edited Loading

yaozengwei commented Oct 8, 2024 •

edited

Loading

yaozengwei commented Oct 8, 2024 •

edited

Loading

yaozengwei Oct 8, 2024 •

edited

Loading

yaozengwei commented Oct 9, 2024 •

edited

Loading