This example load an object detection model converted from Tensorflow and confirm its accuracy and speed based on MS COCO 2017 dataset.
pip install neural-compressor
pip install -r requirements.txt
Note: Validated ONNX Runtime Version.
python prepare_model.py --output_model='ssd_mobilenet_v1_coco_2018_01_28.onnx'
Note: For now, use "onnx==1.14.1" in this step in case you get an error
ValueError: Could not infer attribute explicit_paddings type from empty iterator
. Refer to this link for more details of this error.
Download MS COCO 2017 dataset.
Neural Compressor offers quantization and benchmark diagnosis. Adding diagnosis
parameter to Quantization/Benchmark config will provide additional details useful in diagnostics.
config = PostTrainingQuantConfig(
diagnosis=True,
...
)
config = BenchmarkConfig(
diagnosis=True,
...
)
Static quantization with QOperator format:
bash run_quant.sh --input_model=path/to/model \ # model path as *.onnx
--output_model=path/to/save \ # model path as *.onnx
--dataset_location=path/to/val2017 \
--quant_format="QOperator"
Static quantization with QDQ format:
bash run_quant.sh --input_model=path/to/model \ # model path as *.onnx
--output_model=path/to/save \ # model path as *.onnx
--dataset_location=path/to/val2017 \
--quant_format="QDQ"
bash run_benchmark.sh --input_model=path/to/model \ # model path as *.onnx
--dataset_location=path/to/val2017 \
--mode=performance # or accuracy