Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: benchmark_latency.py cannot exit when using tp #197

Open
JunxiChhen opened this issue Aug 21, 2024 · 4 comments
Open

[Bug]: benchmark_latency.py cannot exit when using tp #197

JunxiChhen opened this issue Aug 21, 2024 · 4 comments
Labels
bug Something isn't working intel Issues or PRs submitted by Intel

Comments

@JunxiChhen
Copy link

Your current environment

Command line:

cd vllm-fork/benchmarks
python benchmark_latency.py \
    --model meta-llama/Meta-Llama-3-8B \
    --dtype bfloat16 \
    --output-len 128 \
    --num-iters 1 \
    --num-iters-warmup 1 \
    --trust-remote-code \
    --batch-size 256 \
    --device hpu \
    --block-size 128 \
    --input-len 1024 \
    -tp 2"

🐛 Describe the bug

The script above cannot exit and hang there util using Ctrl+C to stop it manually.

@JunxiChhen JunxiChhen added the bug Something isn't working label Aug 21, 2024
@JunxiChhen
Copy link
Author

Test tag is 0.5.3.post1-Gaudi-1.17.0

@kzawora-intel
Copy link

Thanks for the report! I've investigated this issue before and unfortunately, it's a bug in HCCL and is beyond vLLM's control, as the deadlock occurs after the main function exits. We are working with HCCL to resolve this.

While it's not ideal, you can exit by adding the following workaround after the benchmark:

import os
os._exit(0)

I will update this issue once we have a HCCL fix.

@michalkuligowski
Copy link

Testing possible fix #379

@michalkuligowski
Copy link

Hi, can you please try latest release v0.5.3.post1+Gaudi-1.18.0 with SynapseAI 1.18.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working intel Issues or PRs submitted by Intel
Projects
None yet
Development

No branches or pull requests

3 participants