Skip to content

BladeDISC Debugging Tips

Tanyo Kwok edited this page Dec 26, 2022 · 8 revisions

This document collected some debugging tips that help users to debug the DISC compilation pipeline.

PyTorch Users

Debug PyTorch to Mhlo Conversion Pass Pipeline

export TF_CPP_VMODULE=disc_compiler=1

This flag maybe helps you if meeting the following log pattern:

...
<unknown>:0: error: TorchBackendToMhloBackendPipeline failed
<unknown>:0: note: see current operation:
"builtin.module"() ({
  "func.func"() ({

Debug TorchBlade Pass Pipeline via LTC-DISC Backend

export PYTORCH_JIT_LOG_LEVEL=">>>disc_compiler:>>>register_disc_class”

This flag can print the pass pipeline on LTC-DISC backend, you can check sub-graph fusion or disc compilation status in the log.

You can also find some pattern just like the following content, and check out the DISC compilation pass pipeline from the log file /tmp/xxx/disc.mlir.log :

[DEBUG register_disc_class.cpp:152] disc compile fusionGroup prim::FusionGroup cmd: TF_CPP_VMODULE=disc_compiler=1 /workspace/BladeDISC/pytorch_blade/torch_blade/disc_compiler_main /tmp/140485835044672-89945-1667912380994240/disc.mlir /tmp/140485835044672-89945-1667912380994240/disc.mlir.out > /tmp/140485835044672-89945-1667912380994240/disc.mlir.log 2>&1

Debug the TorchBlade Pass Pipeline with Python API

export TORCH_BLADE_DEBUG_LOG=true
export TORCH_BLADE_MHLO_DEBUG_LOG=true

after enabling this flag, TorchBlade would dump the TorchScript graph and disc compilation logs at the dump_dir folder in the current working director.

Debug the precision errors of TorchBlade

If you found accuracy issues, you can first enable the accuracy check with the following env flag:

export TORCH_BLADE_DEBUG_ENABLE_ERROR_FALLBACK=true

The accuracy check will tell you which cluster run into problems. Then you can use the following env flags to dump and replay the cluster submodules:

export TORCH_DISC_ENABLE_REPLAY_ON_CLUSTER=true
export TORCH_BLADE_DEBUG_ENABLE_ERROR_FALLBACK=true

TensorFlow Users

TBD