[BUG] <title>An AssertionError occurs when docker run(docker容器运行如下错误)： #1330

jydsun · 2024-11-18T08:50:04Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

==========
== CUDA ==

CUDA Version 11.7.1

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:10<00:00, 1.35s/it]
Running on local URL: http://0.0.0.0:80

To create a public link, set share=True in launch().
User: 柔柔弱弱
Traceback (most recent call last):
File "", line 21, in rotary_kernel
KeyError: ('2-.-0-.-0-d82511111ad128294e9d31a6ac684238-d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float32, torch.float32, torch.float32, torch.float32, None, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, False, False, False, False, 4), (True, True, True, True, (False,), (True, False), (False, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.8/dist-packages/gradio/route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1935, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 1532, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 671, in async_iteration
return await iterator.anext()
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 664, in anext
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 647, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.8/dist-packages/gradio/utils.py", line 809, in gen_wrapper
response = next(iterator)
File "web_demo.py", line 126, in predict
for response in model.chat_stream(tokenizer, _query, history=_task_history, generation_config=config):
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 1216, in stream_generator
for token in self.generate_stream(
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/usr/local/lib/python3.8/dist-packages/transformers_stream_generator/main.py", line 931, in sample_stream
outputs = self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 1045, in forward
transformer_outputs = self.transformer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 893, in forward
outputs = block(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 612, in forward
attn_outputs = self.attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 432, in forward
query = apply_rotary_pos_emb(query, q_pos_emb)
File "/root/.cache/huggingface/modules/transformers_modules/modeling_qwen.py", line 1344, in apply_rotary_pos_emb
return apply_rotary_emb_func(t_float, cos, sin).type_as(t)
File "/usr/local/lib/python3.8/dist-packages/flash_attn/layers/rotary.py", line 122, in apply_rotary_emb
return ApplyRotaryEmb.apply(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.8/dist-packages/flash_attn/layers/rotary.py", line 48, in forward
out = apply_rotary(
File "/usr/local/lib/python3.8/dist-packages/flash_attn/ops/triton/rotary.py", line 213, in apply_rotary
rotary_kernel[grid](
File "", line 41, in rotary_kernel
File "/usr/local/lib/python3.8/dist-packages/triton/compiler.py", line 1629, in compile
metadata["name"] = ptx_get_kernel_name(next_module)
File "/usr/local/lib/python3.8/dist-packages/triton/compiler.py", line 1040, in ptx_get_kernel_name
assert ptx
AssertionError

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

The text was updated successfully, but these errors were encountered:

jydsun · 2024-11-18T08:55:23Z

补充信息，一天都没找到原因，请高手指导，如何解决

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] <title>An AssertionError occurs when docker run(docker容器运行如下错误)： #1330

[BUG] <title>An AssertionError occurs when docker run(docker容器运行如下错误)： #1330

jydsun commented Nov 18, 2024

jydsun commented Nov 18, 2024

[BUG] <title>An AssertionError occurs when docker run(docker容器运行如下错误)： #1330

[BUG] <title>An AssertionError occurs when docker run(docker容器运行如下错误)： #1330

Comments

jydsun commented Nov 18, 2024

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

========== == CUDA ==

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

jydsun commented Nov 18, 2024

==========
== CUDA ==