Skip to content

Latest commit

 

History

History
 
 

Step-by-Step

This example load an object detection model converted from Tensorflow and confirm its accuracy and speed based on MS COCO 2017 dataset.

Prerequisite

1. Environment

pip install neural-compressor
pip install -r requirements.txt

Note: Validated ONNX Runtime Version.

2. Prepare Model

python prepare_model.py --output_model='ssd_mobilenet_v1_coco_2018_01_28.onnx'

Note: For now, use "onnx==1.14.1" in this step in case you get an error ValueError: Could not infer attribute explicit_paddings type from empty iterator. Refer to this link for more details of this error.

3. Prepare Dataset

Download MS COCO 2017 dataset.

Run

Diagnosis

Neural Compressor offers quantization and benchmark diagnosis. Adding diagnosis parameter to Quantization/Benchmark config will provide additional details useful in diagnostics.

Quantization diagnosis

config = PostTrainingQuantConfig(
    diagnosis=True,
    ...
)

Benchmark diagnosis

config = BenchmarkConfig(
    diagnosis=True,
    ...
)

1. Quantization

Static quantization with QOperator format:

bash run_quant.sh --input_model=path/to/model  \ # model path as *.onnx
                   --output_model=path/to/save \ # model path as *.onnx
                   --dataset_location=path/to/val2017 \
                   --quant_format="QOperator"

Static quantization with QDQ format:

bash run_quant.sh --input_model=path/to/model  \ # model path as *.onnx
                   --output_model=path/to/save \ # model path as *.onnx
                   --dataset_location=path/to/val2017 \
                   --quant_format="QDQ"

2. Benchmark

bash run_benchmark.sh --input_model=path/to/model  \ # model path as *.onnx
                      --dataset_location=path/to/val2017 \
                      --mode=performance # or accuracy