[Bug]: benchmark_latency.py cannot exit when using tp #197

JunxiChhen · 2024-08-21T07:37:23Z

Your current environment

Command line:

cd vllm-fork/benchmarks
python benchmark_latency.py \
    --model meta-llama/Meta-Llama-3-8B \
    --dtype bfloat16 \
    --output-len 128 \
    --num-iters 1 \
    --num-iters-warmup 1 \
    --trust-remote-code \
    --batch-size 256 \
    --device hpu \
    --block-size 128 \
    --input-len 1024 \
    -tp 2"

🐛 Describe the bug

The script above cannot exit and hang there util using Ctrl+C to stop it manually.

The text was updated successfully, but these errors were encountered:

JunxiChhen · 2024-08-22T05:18:59Z

Test tag is 0.5.3.post1-Gaudi-1.17.0

kzawora-intel · 2024-08-26T14:55:03Z

Thanks for the report! I've investigated this issue before and unfortunately, it's a bug in HCCL and is beyond vLLM's control, as the deadlock occurs after the main function exits. We are working with HCCL to resolve this.

While it's not ideal, you can exit by adding the following workaround after the benchmark:

import os
os._exit(0)

I will update this issue once we have a HCCL fix.

michalkuligowski · 2024-10-11T12:13:29Z

Testing possible fix #379

michalkuligowski · 2024-11-06T15:18:48Z

Hi, can you please try latest release v0.5.3.post1+Gaudi-1.18.0 with SynapseAI 1.18.0?

JunxiChhen added the bug Something isn't working label Aug 21, 2024

kzawora-intel added the intel Issues or PRs submitted by Intel label Aug 29, 2024

michalkuligowski mentioned this issue Sep 17, 2024

[Bug]: Using tensor parallel during offline inference causes the process to hang #220

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: benchmark_latency.py cannot exit when using tp #197

[Bug]: benchmark_latency.py cannot exit when using tp #197

JunxiChhen commented Aug 21, 2024

JunxiChhen commented Aug 22, 2024

kzawora-intel commented Aug 26, 2024

michalkuligowski commented Oct 11, 2024

michalkuligowski commented Nov 6, 2024

[Bug]: benchmark_latency.py cannot exit when using tp #197

[Bug]: benchmark_latency.py cannot exit when using tp #197

Comments

JunxiChhen commented Aug 21, 2024

Your current environment

🐛 Describe the bug

JunxiChhen commented Aug 22, 2024

kzawora-intel commented Aug 26, 2024

michalkuligowski commented Oct 11, 2024

michalkuligowski commented Nov 6, 2024