Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] <title>An AssertionError occurs when docker run(docker容器运行如下错误): #1330

Open
2 tasks done
jydsun opened this issue Nov 18, 2024 · 1 comment
Open
2 tasks done

Comments

@jydsun
Copy link

jydsun commented Nov 18, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

==========
== CUDA ==

CUDA Version 11.7.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:10<00:00, 1.35s/it]
Running on local URL: http://0.0.0.0:80

To create a public link, set share=True in launch().
User: 柔柔弱弱
Traceback (most recent call last):
File "", line 21, in rotary_kernel
KeyError: ('2-.-0-.-0-d82511111ad128294e9d31a6ac684238-d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float32, torch.float32, torch.float32, torch.float32, None, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, False, False, False, False, 4), (True, True, True, True, (False,), (True, False), (False, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.8/dist-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1532, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 671, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 664, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 647, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 809, in gen_wrapper
response = next(iterator)
File "web_demo.py", line 126, in predict
for response in model.chat_stream(tokenizer, _query, history=_task_history, generation_config=config):
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 1216, in stream_generator
for token in self.generate_stream(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/usr/local/lib/python3.8/dist-packages/transformers_stream_generator/main.py", line 931, in sample_stream
outputs = self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 1045, in forward
transformer_outputs = self.transformer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 893, in forward
outputs = block(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 612, in forward
attn_outputs = self.attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 432, in forward
query = apply_rotary_pos_emb(query, q_pos_emb)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 1344, in apply_rotary_pos_emb
return apply_rotary_emb_func(t_float, cos, sin).type_as(t)
File "/usr/local/lib/python3.8/dist-packages/flash_attn/layers/rotary.py", line 122, in apply_rotary_emb
return ApplyRotaryEmb.apply(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.8/dist-packages/flash_attn/layers/rotary.py", line 48, in forward
out = apply_rotary(
File "/usr/local/lib/python3.8/dist-packages/flash_attn/ops/triton/rotary.py", line 213, in apply_rotary
rotary_kernel[grid](
File "", line 41, in rotary_kernel
File "/usr/local/lib/python3.8/dist-packages/triton/compiler.py", line 1629, in compile
metadata["name"] = ptx_get_kernel_name(next_module)
File "/usr/local/lib/python3.8/dist-packages/triton/compiler.py", line 1040, in ptx_get_kernel_name
assert ptx
AssertionError

Uploading 微信图片_20241118164724.png…

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

@jydsun
Copy link
Author

jydsun commented Nov 18, 2024

微信图片_20241118164724
微信图片_20241118165342

补充信息,一天都没找到原因,请高手指导,如何解决

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant