oneprof fails for LLM workloads #45

sriraman2020 · 2023-10-27T12:26:51Z

(llama-17oct) user@BA-ARCH-LAB-SPR-PVC-2T:~/17oct/frameworks.ai.pytorch.gpu-models/LLM/generation$ /home/user/17oct/pti-gpu/tools/oneprof/build/./oneprof -q -o newlog_llama7b_oneprof_q_O_log.txt -p /home/user/17oct/oneprof_temp/ -s 1000 python -u run_generation.py --device xpu --ipex --dtype float16 --input-tokens 32 --max-new-tokens 32 --num-beam 1 --benchmark -m decapoda-research/llama-7b-hf --sub-model-name llama-7b
Namespace(model_id='decapoda-research/llama-7b-hf', sub_model_name='llama-7b', device='xpu', dtype='float16', input_tokens='32', max_new_tokens=32, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', accuracy_only=False, num_beam=1, num_iter=10, num_warmup=3, batch_size=1, token_latency=False, print_memory=False, disable_optimize_transformers=False, woq=False, calib_dataset='wikitext2', calib_group_size=-1, calib_output_dir='./', calib_checkpoint_name='quantized_weight.pt', calib_nsamples=128, calib_wbits=4, calib_seed=0, woq_checkpoint_path='')
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:36<00:00, 1.11s/it]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
python: /home/user/17oct/pti-gpu/tools/oneprof/metric_query_cache.h:69: _zet_metric_query_handle_t* MetricQueryCache::GetQuery(ze_context_handle_t): Assertion `status == ZE_RESULT_SUCCESS' failed.

Basically the issue with "-q" option. Seems to be running fine with "-k" option.
Can you pls check on priority. This is blocking analysis of LLM workloads.

* adding host synchronize event handling to hot functions sample * adding host synchronize event handling to hot functions sample

jfedorov pushed a commit that referenced this issue Dec 18, 2023

vtsymbal hotkernels sample (#45)

9ee0e46

* adding host synchronize event handling to hot functions sample * adding host synchronize event handling to hot functions sample

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

oneprof fails for LLM workloads #45

oneprof fails for LLM workloads #45

sriraman2020 commented Oct 27, 2023

oneprof fails for LLM workloads #45

oneprof fails for LLM workloads #45

Comments

sriraman2020 commented Oct 27, 2023