diff --git a/recipes/rasr/README.md b/recipes/rasr/README.md index 170ad283..6e69485f 100644 --- a/recipes/rasr/README.md +++ b/recipes/rasr/README.md @@ -5,7 +5,7 @@ This is a repository sharing pre-trained acoustic models and language models for ## Dependencies -* [`Flashlight`](https://github.com/flashlight/flashlight) +* [`Flashlight`](https://github.com/flashlight/flashlight), models are tested/trained at commit d2e1924cb2a2b32b48cc326bb7e332ca3ea54f67 (conformer was changed after this commit and pre-trained conformer models cannot be used with later commits) * [`Flashlight` ASR app](https://github.com/flashlight/flashlight/tree/master/flashlight/app/asr) ## Models diff --git a/recipes/slimIPL/README.md b/recipes/slimIPL/README.md index 688c2d2e..0c9432ad 100644 --- a/recipes/slimIPL/README.md +++ b/recipes/slimIPL/README.md @@ -2,6 +2,10 @@ Recent results in end-to-end automatic speech recognition have demonstrated the efficacy of pseudo-labeling for semi-supervised models trained both with Connectionist Temporal Classification (CTC) and Sequence-to-Sequence (seq2seq) losses. Iterative Pseudo-Labeling (IPL), which continuously trains a single model using pseudo-labels iteratively re-generated as the model learns, has been shown to further improve performance in ASR. We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model. We call this approach Language-Model-Free IPL (slimIPL) and give a resultant training setup for low-resource settings with CTC-based models. slimIPL features a dynamic cache for pseudo-labels which reduces sensitivity to changes in relabeling hyperparameters and results in improved training stability. slimIPL is also highly-efficient and requires 3.5-4x fewer computational resources to converge than other state-of-the-art semi/self-supervised approaches. With only 10 hours of labeled audio, slimIPL is competitive with self-supervised approaches, and is state-of-the-art with 100 hours of labeled audio without the use of a language model both at test time and during pseudo-label generation. +## Dependency + +- flashlight https://github.com/flashlight/flashlight (all code is tested at commit 03c51129f320eed7ff0d416f7e8291a029439039) + ## Training All models are trained on 16 gpus. To run training