Pytorch and TFRecords data loaders for several audio datasets
Datasets
- ESC - dataset of environmental sounds
- LibriSpeech - corpus of read English speech
- LibriSpeech downloader for PyTorch
- PyTorch DataSet
- PyTorch DataSet for TFRecord
- PyTorch DataLoaders for TFRecord
- TFRecords Loader
- TFRecords Generator
- NSynth - dataset of annotated musical notes
- NSynth downloader and generator of *.h5py and *.tfrecord formats
- TFRecord reader
- PyTorch Dataset
- PyTorch Dataset for TFrecord
- PyTorch DataLoaders for TFRecord
- VoxCeleb2 - human speech, extracted from YouTube interview videos
- Pytorch loader
- TFRecords loader
- GTZAN - audio tracks from a variety of sources annotated with genre class
- CallCenter - audio tracks with human and non-human speech
For validation we frequently use the following scheme:
- Read 10 random crops from a file;
- Predict a class for each crop;
- Averaging results.
For this scheme we've done additional DataLoaders for PyTorch: