This is a simple implementation of the paper STATE-OF-THE-ART SPEECH RECOGNITION USING MULTI-STREAM SELF-ATTENTION WITH DILATED 1D CONVOLUTIONS in pytorch. Please visit main.ipynb
for training and inference details.
- Using pytorch_lightning as the training backbone
- Using the popular Librispeech dataset
- Modules written in pure pytorch