-
Notifications
You must be signed in to change notification settings - Fork 160
BladeDISC Debugging Tips
This document collected some debugging tips that help users to debug the DISC compilation pipeline.
export TF_CPP_VMODULE=disc_compiler=1
This flag maybe helps you if meeting the following log pattern:
...
<unknown>:0: error: TorchBackendToMhloBackendPipeline failed
<unknown>:0: note: see current operation:
"builtin.module"() ({
"func.func"() ({
export PYTORCH_JIT_LOG_LEVEL=">>>disc_compiler:>>>register_disc_class”
This flag can print the pass pipeline on LTC-DISC backend, you can check sub-graph fusion or disc compilation status in the log.
You can also find some pattern just like the following content, and check out the DISC compilation pass pipeline from the log file /tmp/xxx/disc.mlir.log
:
[DEBUG register_disc_class.cpp:152] disc compile fusionGroup prim::FusionGroup cmd: TF_CPP_VMODULE=disc_compiler=1 /workspace/BladeDISC/pytorch_blade/torch_blade/disc_compiler_main /tmp/140485835044672-89945-1667912380994240/disc.mlir /tmp/140485835044672-89945-1667912380994240/disc.mlir.out > /tmp/140485835044672-89945-1667912380994240/disc.mlir.log 2>&1
export TORCH_BLADE_DEBUG_LOG=true
export TORCH_BLADE_MHLO_DEBUG_LOG=true
after enabling this flag, TorchBlade would dump the TorchScript graph and disc compilation logs at the dump_dir
folder in the current working director.
If you found accuracy issues, you can first enable the accuracy check with the following env flag:
export TORCH_BLADE_DEBUG_ENABLE_ERROR_FALLBACK=true
The accuracy check will tell you which cluster run into problems. Then you can use the following env flags to dump and replay the cluster submodules:
export TORCH_DISC_ENABLE_REPLAY_ON_CLUSTER=true
export TORCH_BLADE_DEBUG_ENABLE_ERROR_FALLBACK=true
TBD