Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flashlight dependencies for RASR and slimIPL #986

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion recipes/rasr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This is a repository sharing pre-trained acoustic models and language models for

## Dependencies

* [`Flashlight`](https://github.com/flashlight/flashlight)
* [`Flashlight`](https://github.com/flashlight/flashlight), models are tested/trained at commit d2e1924cb2a2b32b48cc326bb7e332ca3ea54f67 (conformer was changed after this commit and pre-trained conformer models cannot be used with later commits)
* [`Flashlight` ASR app](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr)

## Models
Expand Down
4 changes: 4 additions & 0 deletions recipes/slimIPL/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

Recent results in end-to-end automatic speech recognition have demonstrated the efficacy of pseudo-labeling for semi-supervised models trained both with Connectionist Temporal Classification (CTC) and Sequence-to-Sequence (seq2seq) losses. Iterative Pseudo-Labeling (IPL), which continuously trains a single model using pseudo-labels iteratively re-generated as the model learns, has been shown to further improve performance in ASR. We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model. We call this approach Language-Model-Free IPL (slimIPL) and give a resultant training setup for low-resource settings with CTC-based models. slimIPL features a dynamic cache for pseudo-labels which reduces sensitivity to changes in relabeling hyperparameters and results in improved training stability. slimIPL is also highly-efficient and requires 3.5-4x fewer computational resources to converge than other state-of-the-art semi/self-supervised approaches. With only 10 hours of labeled audio, slimIPL is competitive with self-supervised approaches, and is state-of-the-art with 100 hours of labeled audio without the use of a language model both at test time and during pseudo-label generation.

## Dependency

- flashlight https://github.com/flashlight/flashlight (all code is tested at commit 03c51129f320eed7ff0d416f7e8291a029439039)

## Training

All models are trained on 16 gpus. To run training
Expand Down