Adapting the Qwen 2.5 0.5B model, I encountered a model conversion failure issue on the MTK platform. #6228

tiger-of-shawn · 2024-10-15T10:34:29Z

some logs:

#source shell_scripts/export_llama.sh qwen2 "" "" "" llama3.txt

checkpoint_files: ['models/llm_models/weights/Qwen2.5-0.5B-Instruct/model.safetensors']
Preparing Model Calibration Inputs...
Exporting Chunk 0 to PTE
Getting pre autograd ATen Dialect Graph
model info: Qwen2ModelChunk(
(layers): ModuleList(
(0-23): 24 x Qwen2DecoderLayer(
(self_attn): Qwen2Attention(
(q_proj): Linear(in_features=896, out_features=896, bias=True)
(k_proj): Linear(in_features=896, out_features=128, bias=True)
(v_proj): Linear(in_features=896, out_features=128, bias=True)
(o_proj): Linear(in_features=896, out_features=896, bias=False)
)
(mlp): Qwen2MLP(
(gate_proj): Linear(in_features=896, out_features=4864, bias=False)
(down_proj): Linear(in_features=4864, out_features=896, bias=False)
(up_proj): Linear(in_features=896, out_features=4864, bias=False)
)
(input_norm): RMSNorm()
(post_attention_norm): RMSNorm()
)
)
(norm): RMSNorm()
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
)

W1015 10:29:36.177991 578378 torch/_export/init.py:64] +============================+
W1015 10:29:36.178128 578378 torch/_export/init.py:65] | !!! WARNING !!! |
W1015 10:29:36.178169 578378 torch/_export/init.py:66] +============================+
W1015 10:29:36.178198 578378 torch/_export/init.py:67] capture_pre_autograd_graph() is deprecated and doesn't provide any function guarantee moving forward.
W1015 10:29:36.178226 578378 torch/_export/init.py:68] Please switch to use torch.export.export_for_training instead.
Batch: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:05<00:00, 1.86it/s]
Calibrating Model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.90s/it]
Getting ATen Dialect Graph
Exporting Shape 128t512c to:
pte/Qwen2.5-0.5B-Instruct_A16W4_1_chunks_128t512c/Qwen2.5-0.5B-Instruct_A16W4_1_chunks_128t512c_0.pte
example_input shape: torch.Size([1, 128, 896])
Lowering to Edge Dialect Graph
Delegating Edge Program to Neuropilot Backend

Traceback (most recent call last):
File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 491, in
main()
File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 477, in main
export_to_et_ir(
File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 362, in export_to_et_ir
delegated_program = edge_program.to_backend(partitioner)
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1288, in to_backend
new_edge_programs[name] = to_backend(program, partitioner)
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/functools.py", line 878, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 387, in _
tagged_graph_module = _partition_and_lower(
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 310, in _partition_and_lower
partitioned_module = _partition_and_lower_one_graph_module(
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 249, in _partition_and_lower_one_graph_module
lowered_submodule = to_backend(
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/functools.py", line 878, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 113, in _
preprocess_result: PreprocessResult = cls.preprocess(
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/backends/mediatek/preprocess.py", line 68, in preprocess
model_bytes = mtk_neuron.compile(mlir_str, " ".join(compile_options))
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/mtk_neuron/mtk_neuron.py", line 127, in

compile
raise RuntimeError(f'Compile error:\n{status["log"]}')
RuntimeError: Compile error:
NIR[1761]: FullyConnectedLayer
├ MDLA: Dimension should be <= 65535. Operand: 1 got <151936 x 896>.
├ MDLA: Dimension should be <= 65535. Result : 0 got <128 x 151936>.
├ EDPA: unsupported operation
WARNING: Failed to process the supernode.

tiger-of-shawn · 2024-10-16T09:30:44Z

The error seems to be related to 'tie_word_embeddings.' I will try to work on a fix soon.

neuropilot-captain · 2024-10-18T01:24:56Z

Hi, @tiger-of-shawn, thanks for your feedback! We have released the latest NeuroPilot Express SDK for ExecuTorch. This update includes optimizations specifically addressing the issue you highlighted. Please give it a try!

tiger-of-shawn · 2024-10-18T05:51:46Z

Hi, @tiger-of-shawn, thanks for your feedback! We have released the latest NeuroPilot Express SDK for ExecuTorch. This update includes optimizations specifically addressing the issue you highlighted. Please give it a try!

Thank you for your response; it’s working perfectly now.

I have run the sample application on MTK 9000, prefill 990 tokens/s, decode 61 tokens/s

I 00:00:01.045045 executorch:mtk_llama_executor_runner.cpp:194] Done analyzing prompt in 0.129182 sec (990.850118 tok/s)
I 00:00:04.956007 executorch:mtk_llama_executor_runner.cpp:296] Token generation speed: 61.639103 tok/s

kirklandsign added the module: mediatek Issues related to the Mediatek delegate label Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapting the Qwen 2.5 0.5B model, I encountered a model conversion failure issue on the MTK platform. #6228

Adapting the Qwen 2.5 0.5B model, I encountered a model conversion failure issue on the MTK platform. #6228

tiger-of-shawn commented Oct 15, 2024

tiger-of-shawn commented Oct 16, 2024

neuropilot-captain commented Oct 18, 2024

tiger-of-shawn commented Oct 18, 2024

Adapting the Qwen 2.5 0.5B model, I encountered a model conversion failure issue on the MTK platform. #6228

Adapting the Qwen 2.5 0.5B model, I encountered a model conversion failure issue on the MTK platform. #6228

Comments

tiger-of-shawn commented Oct 15, 2024

tiger-of-shawn commented Oct 16, 2024

neuropilot-captain commented Oct 18, 2024

tiger-of-shawn commented Oct 18, 2024