Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapting the Qwen 2.5 0.5B model, I encountered a model conversion failure issue on the MTK platform. #6228

Open
tiger-of-shawn opened this issue Oct 15, 2024 · 3 comments
Labels
module: mediatek Issues related to the Mediatek delegate

Comments

@tiger-of-shawn
Copy link

some logs:

#source shell_scripts/export_llama.sh qwen2 "" "" "" llama3.txt

checkpoint_files: ['models/llm_models/weights/Qwen2.5-0.5B-Instruct/model.safetensors']
Preparing Model Calibration Inputs...
Exporting Chunk 0 to PTE
Getting pre autograd ATen Dialect Graph
model info: Qwen2ModelChunk(
(layers): ModuleList(
(0-23): 24 x Qwen2DecoderLayer(
(self_attn): Qwen2Attention(
(q_proj): Linear(in_features=896, out_features=896, bias=True)
(k_proj): Linear(in_features=896, out_features=128, bias=True)
(v_proj): Linear(in_features=896, out_features=128, bias=True)
(o_proj): Linear(in_features=896, out_features=896, bias=False)
)
(mlp): Qwen2MLP(
(gate_proj): Linear(in_features=896, out_features=4864, bias=False)
(down_proj): Linear(in_features=4864, out_features=896, bias=False)
(up_proj): Linear(in_features=896, out_features=4864, bias=False)
)
(input_norm): RMSNorm()
(post_attention_norm): RMSNorm()
)
)
(norm): RMSNorm()
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
)

W1015 10:29:36.177991 578378 torch/_export/init.py:64] +============================+
W1015 10:29:36.178128 578378 torch/_export/init.py:65] | !!! WARNING !!! |
W1015 10:29:36.178169 578378 torch/_export/init.py:66] +============================+
W1015 10:29:36.178198 578378 torch/_export/init.py:67] capture_pre_autograd_graph() is deprecated and doesn't provide any function guarantee moving forward.
W1015 10:29:36.178226 578378 torch/_export/init.py:68] Please switch to use torch.export.export_for_training instead.
Batch: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:05<00:00, 1.86it/s]
Calibrating Model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.90s/it]
Getting ATen Dialect Graph
Exporting Shape 128t512c to:
pte/Qwen2.5-0.5B-Instruct_A16W4_1_chunks_128t512c/Qwen2.5-0.5B-Instruct_A16W4_1_chunks_128t512c_0.pte
example_input shape: torch.Size([1, 128, 896])
Lowering to Edge Dialect Graph
Delegating Edge Program to Neuropilot Backend

Traceback (most recent call last):
File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 491, in
main()
File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 477, in main
export_to_et_ir(
File "/home/qwen/executorch/examples/mediatek/model_export_scripts/qwen2.py", line 362, in export_to_et_ir
delegated_program = edge_program.to_backend(partitioner)
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/program/_program.py", line 1288, in to_backend
new_edge_programs[name] = to_backend(program, partitioner)
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/functools.py", line 878, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 387, in _
tagged_graph_module = _partition_and_lower(
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 310, in _partition_and_lower
partitioned_module = _partition_and_lower_one_graph_module(
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 249, in _partition_and_lower_one_graph_module
lowered_submodule = to_backend(
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/functools.py", line 878, in wrapper
return dispatch(args[0].class)(*args, **kw)
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/exir/backend/backend_api.py", line 113, in _
preprocess_result: PreprocessResult = cls.preprocess(
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/executorch/backends/mediatek/preprocess.py", line 68, in preprocess
model_bytes = mtk_neuron.compile(mlir_str, " ".join(compile_options))
File "/home/qwen/miniconda3/envs/et_qnn_2/lib/python3.10/site-packages/mtk_neuron/mtk_neuron.py", line 127, in

compile
raise RuntimeError(f'Compile error:\n{status["log"]}')
RuntimeError: Compile error:
NIR[1761]: FullyConnectedLayer
├ MDLA: Dimension should be <= 65535. Operand: 1 got <151936 x 896>.
├ MDLA: Dimension should be <= 65535. Result : 0 got <128 x 151936>.
├ EDPA: unsupported operation
WARNING: Failed to process the supernode.

@kirklandsign kirklandsign added the module: mediatek Issues related to the Mediatek delegate label Oct 16, 2024
@tiger-of-shawn
Copy link
Author

The error seems to be related to 'tie_word_embeddings.' I will try to work on a fix soon.

@neuropilot-captain
Copy link
Contributor

Hi, @tiger-of-shawn, thanks for your feedback! We have released the latest NeuroPilot Express SDK for ExecuTorch. This update includes optimizations specifically addressing the issue you highlighted. Please give it a try!

@tiger-of-shawn
Copy link
Author

Hi, @tiger-of-shawn, thanks for your feedback! We have released the latest NeuroPilot Express SDK for ExecuTorch. This update includes optimizations specifically addressing the issue you highlighted. Please give it a try!

Thank you for your response; it’s working perfectly now.

I have run the sample application on MTK 9000, prefill 990 tokens/s, decode 61 tokens/s

I 00:00:01.045045 executorch:mtk_llama_executor_runner.cpp:194] Done analyzing prompt in 0.129182 sec (990.850118 tok/s)
I 00:00:04.956007 executorch:mtk_llama_executor_runner.cpp:296] Token generation speed: 61.639103 tok/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: mediatek Issues related to the Mediatek delegate
Projects
None yet
Development

No branches or pull requests

3 participants