This document lists steps of reproducing Intel Optimized PyTorch RNNT models tuning results via Neural Compressor.
Our example comes from MLPerf Inference Benchmark Suite.
Python 3.6 or higher version is recommended.
cd examples/pytorch/speech_recognition/rnnt/quantization/ptq_dynamic/fx
pip install -r requirements.txt
Check your gcc version with the command: gcc -v
GCC5 or above is required.
# install mlperf
bash prepare_loadgen.sh
bash prepare_dataset.sh --download_dir=origin_dataset --convert_dir=convert_dataset
prepare_dataset.sh contains two stages:
- stage1: download LibriSpeech/dev-clean dataset and extract it.
- stage2: convert .flac file to .wav file
wget https://zenodo.org/record/3662521/files/DistributedDataParallel_1576581068.9962234-epoch-100.pt?download=1 -O rnnt.pt
The changes made are as follows:
- pytorch_SUT.py: Removed jit script conversion.
- pytorch/decoders.py: Removed assertion of torch.jit.ScriptModule.
bash run_tuning.sh --dataset_location=convert_dataset --input_model=./rnnt.pt --output_model=saved_results
# fp32
bash run_benchmark.sh --dataset_location=convert_dataset --input_model=./rnnt.pt --mode=performance/accuracy --int8=false
# int8
bash run_benchmark.sh --dataset_location=convert_dataset --input_model=./rnnt.pt --mode=performance/accuracy --int8=true
The first part is accuracy/percentage, right part is time_usage/second.
- FP32 baseline is: [92.5477, 796.7552].
- Tune 1 result is: [91.5872, 1202.2529]
- Tune 2 result is: [91.5894, 1201.3231]
- Tune 3 result is: [91.5195, 1211.5965]
- Tune 4 result is: [91.6030, 1218.2211]
- Tune 5 result is: [91.4812, 1169.5080]
- ...