Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/afierka/mss acc fix #456

Draft
wants to merge 26 commits into
base: habana_main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
4ce44c7
Add multi step scheduling scenario to jenkins CI
afierka-intel Oct 30, 2024
f6ce404
Fix formatting
afierka-intel Oct 30, 2024
8cfc040
Add debug print
afierka-intel Oct 30, 2024
b23a186
Fix typo
afierka-intel Oct 30, 2024
cd7fa7e
Add fp8 test to jenkins CI (#429)
afierka-intel Oct 30, 2024
4a27360
Add multi step scheduling scenario to jenkins CI
afierka-intel Oct 30, 2024
732e7a0
Fix typo
afierka-intel Oct 30, 2024
66569c6
Cleanup
afierka-intel Oct 30, 2024
87fcca3
Disable MSS
afierka-intel Oct 30, 2024
69be7fa
Enable MSS
afierka-intel Oct 30, 2024
74b5668
Move MSS tests to separate suite
afierka-intel Oct 31, 2024
95b3e7d
[DEBUG] Disable all non-mss tests
afierka-intel Oct 31, 2024
0a081d3
[DEBUG] num_scheduler_steps=2
afierka-intel Oct 31, 2024
177b9b2
[DEBUG] num_scheduler_steps=5
afierka-intel Oct 31, 2024
551c37a
[DEBUG] num_scheduler_steps=20
afierka-intel Oct 31, 2024
485347a
[DEBUG] num_scheduler_steps=40
afierka-intel Oct 31, 2024
150eb46
[DEBUG] num_scheduler_steps=64
afierka-intel Oct 31, 2024
3b32642
[DEBUG] num_scheduler_steps=128
afierka-intel Oct 31, 2024
0a6c54c
[DEBUG] Apply fix from PR #452
afierka-intel Nov 4, 2024
5e97e67
[DEBUG] Fix function order in cherry-picked code
afierka-intel Nov 4, 2024
3bd98f8
[DEBUG] num_scheduler_steps=64
afierka-intel Nov 4, 2024
70e8ff3
[DEBUG] num_scheduler_steps=40
afierka-intel Nov 4, 2024
c214c82
[DEBUG] num_scheduler_steps=20
afierka-intel Nov 4, 2024
912587d
[DEBUG] num_scheduler_steps=10
afierka-intel Nov 4, 2024
19182b9
[DEBUG] num_scheduler_steps=5
afierka-intel Nov 4, 2024
47dc4e5
[DEBUG] num_scheduler_steps=2
afierka-intel Nov 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# FIXME(kzawora): these scores were generated using vLLM on HPU, we need to confirm them on HF
# VLLM_SKIP_WARMUP=true bash run-lm-eval-gsm-cot-llama-vllm-baseline.sh -m "/mnt/weka/data/pytorch/llama3.1/Meta-Llama-3.1-8B-Instruct" -b 128 -l 1319 -f 8 -t 1
model_name: "/mnt/weka/data/pytorch/llama3.1/Meta-Llama-3.1-8B-Instruct"
tasks:
- name: "gsm8k_cot_llama"
metrics:
- name: "exact_match,strict-match"
value: 0.8317
- name: "exact_match,flexible-extract"
value: 0.8355
limit: null
num_fewshot: 8
dtype: "bfloat16"
fewshot_as_multiturn: true
apply_chat_template: true
num_scheduler_steps: 2
1 change: 1 addition & 0 deletions .jenkins/lm-eval-harness/configs/models-mss.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Meta-Llama-3.1-8B-Instruct-mss.yaml
3 changes: 3 additions & 0 deletions .jenkins/lm-eval-harness/test_lm_eval_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ def launch_lm_eval(eval_config):
model_args += ",quantization=inc," \
"kv_cache_dtype=fp8_inc," \
"weights_load_device=cpu"
if eval_config.get("num_scheduler_steps"):
model_args += \
f",num_scheduler_steps={eval_config.get('num_scheduler_steps')}"
kwargs = {}
if 'fewshot_as_multiturn' in eval_config:
kwargs['fewshot_as_multiturn'] = eval_config['fewshot_as_multiturn']
Expand Down
57 changes: 34 additions & 23 deletions .jenkins/test_config.yaml
Original file line number Diff line number Diff line change
@@ -1,29 +1,40 @@
# test_config.yaml
stages:
- name: test_gsm8k_small_models
# - name: test_gsm8k_small_models
# steps:
# - name: gsm8k_small_g3_tp1
# flavor: g3
# command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-small.txt -t 1
# - name: gsm8k_small_g3_tp2
# flavor: g3.s
# command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-small.txt -t 2
# - name: gsm8k_small_g2_tp1
# flavor: g2
# command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-small.txt -t 1
# - name: gsm8k_small_g2_tp2
# flavor: g2.s
# command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-small.txt -t 2
# - name: test_gsm8k_large_models
# steps:
# - name: gsm8k_large_g3_tp2
# flavor: g3.s
# command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-large.txt -t 2
# - name: gsm8k_large_g2_tp4
# flavor: g2.m
# command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-large.txt -t 4
# - name: test_gsm8k_fp8
# steps:
# - name: gsm8k_small_g3_tp1_fp8
# flavor: g3
# command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-fp8.txt -t 1
- name: test_gsm8k_mss
steps:
- name: gsm8k_small_g3_tp1
- name: gsm8k_small_g3_tp1_mss
flavor: g3
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-small.txt -t 1
- name: gsm8k_small_g3_tp2
flavor: g3.s
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-small.txt -t 2
- name: gsm8k_small_g2_tp1
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-mss.txt -t 1
- name: gsm8k_small_g2_tp1_mss
flavor: g2
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-small.txt -t 1
- name: gsm8k_small_g2_tp2
flavor: g2.s
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-small.txt -t 2
- name: test_gsm8k_large_models
steps:
- name: gsm8k_large_g3_tp2
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-mss.txt -t 1
- name: gsm8k_small_g3_tp2_mss
flavor: g3.s
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-large.txt -t 2
- name: gsm8k_large_g2_tp4
flavor: g2.m
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-large.txt -t 4
- name: test_gsm8k_fp8
steps:
- name: gsm8k_small_g3_tp1_fp8
flavor: g3
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-fp8.txt -t 1
command: cd .jenkins/lm-eval-harness && bash run-tests.sh -c configs/models-mss.txt -t 2
27 changes: 24 additions & 3 deletions vllm/worker/hpu_model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -2109,6 +2109,19 @@ def execute_model(
# we only want to pythonize in the last step
sampling_metadata.skip_sampler_cpu_output = True
self.model.model.sampler.include_gpu_probs_tensor = True
cache_orig_output_tokens_len: List[Dict] = []

def try_revert_dummy_output_tokens():
if len(cache_orig_output_tokens_len) > 0:
# Reuse the original output token ids length
for i, seq_group_metadata in enumerate(
seq_group_metadata_list):
for j, data in seq_group_metadata.seq_data.items():
orig_output_tokens_len = \
cache_orig_output_tokens_len[i][j]
data.output_token_ids = \
data.output_token_ids[:orig_output_tokens_len]

for i in range(num_steps):
with self.profiler.record_event('internal', model_event_name):
hidden_states = self.model.forward(
Expand Down Expand Up @@ -2155,24 +2168,30 @@ def execute_model(
htorch.core.mark_step()
if i < num_steps - 1:
if i == 0:
import copy
ctx = model_input.async_callback.keywords[ # type: ignore
"ctx"]
seq_group_metadata_list = ctx.seq_group_metadata_list
seq_group_metadata_list = copy.deepcopy(
seq_group_metadata_list)
# Cache the original output token ids
for i, seq_group_metadata in enumerate(
seq_group_metadata_list):
cache_orig_output_tokens_len.append({})
for j, data in seq_group_metadata.seq_data.items():
cache_orig_output_tokens_len[i][j] = \
len(data.output_token_ids)
for seq_group_metadata in seq_group_metadata_list:
for data in seq_group_metadata.seq_data.values():
max_output_len = sampling_metadata.seq_groups[
0].sampling_params.max_tokens
if len(data.output_token_ids) < max_output_len - 1:
# add a place holder for prepare_decode
# arbitrary value, this could be any token
dummy_token = (540, )
data.output_token_ids += (dummy_token)
else:
if num_steps == 1:
return [output]
else:
try_revert_dummy_output_tokens()
return []

result = self._prepare_decode(seq_group_metadata_list,
Expand All @@ -2185,6 +2204,8 @@ def execute_model(
"attn_metadata":
self.trim_attn_metadata(result.attn_metadata)
})
else:
try_revert_dummy_output_tokens()

if self.is_driver_worker and self.profiler.enabled:
# Stop recording 'execute_model' event
Expand Down
Loading