Optimization Orchestration

Introduction

1.1. One-shot
Orchestration Support Matrix
Get Started with Orchestration API
Examples

Introduction

Orchestration is the combination of multiple optimization techniques, either applied simultaneously (one-shot). Intel Neural Compressor supports arbitrary meaningful combinations of supported optimization methods under one-shot, such as pruning during quantization-aware training.

One-shot

Since quantization-aware training, pruning and distillation all leverage training process for optimization, we can achieve the goal of optimization through one shot training with arbitrary meaningful combinations of these methods, which often gain more benefits in terms of performance and accuracy than just one compression technique applied, and usually are as efficient as applying just one compression technique. The three possible combinations are shown below.

Pruning during quantization-aware training
Distillation with pattern lock pruning
Distillation with pattern lock pruning and quantization-aware training

Orchestration Support Matrix

Orchestration	Combinations	Supported
One-shot	Pruning + Quantization Aware Training	✔
	Distillation + Quantization Aware Training	✔
	Distillation + Pruning	✔
	Distillation + Pruning + Quantization Aware Training	✔

Get Started with Orchestration API

Neural Compressor defines Scheduler class to automatically pipeline execute model optimization with one shot way.

User instantiates model optimization components, such as quantization, pruning, distillation, separately. After that, user could append those separate optimization objects into scheduler's pipeline, the scheduler API executes them one by one.

In following example it execute the distillation and pruning with one-shot way, the code is like below.

from neural_compressor.training import prepare_compression
from neural_compressor.config import DistillationConfig, KnowledgeDistillationLossConfig, WeightPruningConfig
distillation_criterion = KnowledgeDistillationLossConfig()
d_conf = DistillationConfig(model, distillation_criterion)
p_conf = WeightPruningConfig()
compression_manager = prepare_compression(model=model, confs=[d_conf, p_conf])

compression_manager.callbacks.on_train_begin()
train_loop:
    compression_manager.on_train_begin()
    for epoch in range(epochs):
        compression_manager.on_epoch_begin(epoch)
        for i, batch in enumerate(dataloader):
            compression_manager.on_step_begin(i)
            ......
            output = model(batch)
            loss = ......
            loss = compression_manager.on_after_compute_loss(batch, output, loss)
            loss.backward()
            compression_manager.on_before_optimizer_step()
            optimizer.step()
            compression_manager.on_step_end()
        compression_manager.on_epoch_end()
    compression_manager.on_train_end()

model.save('./path/to/save')

Examples

Orchestration Examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orchestration.md

orchestration.md

Optimization Orchestration

Introduction

One-shot

Orchestration Support Matrix

Get Started with Orchestration API

Examples

Files

orchestration.md

Latest commit

History

orchestration.md

File metadata and controls

Optimization Orchestration

Introduction

One-shot

Orchestration Support Matrix

Get Started with Orchestration API

Examples