Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oneprof fails for LLM workloads #45

Open
sriraman2020 opened this issue Oct 27, 2023 · 0 comments
Open

oneprof fails for LLM workloads #45

sriraman2020 opened this issue Oct 27, 2023 · 0 comments

Comments

@sriraman2020
Copy link

(llama-17oct) user@BA-ARCH-LAB-SPR-PVC-2T:~/17oct/frameworks.ai.pytorch.gpu-models/LLM/generation$ /home/user/17oct/pti-gpu/tools/oneprof/build/./oneprof -q -o newlog_llama7b_oneprof_q_O_log.txt -p /home/user/17oct/oneprof_temp/ -s 1000 python -u run_generation.py --device xpu --ipex --dtype float16 --input-tokens 32 --max-new-tokens 32 --num-beam 1 --benchmark -m decapoda-research/llama-7b-hf --sub-model-name llama-7b
Namespace(model_id='decapoda-research/llama-7b-hf', sub_model_name='llama-7b', device='xpu', dtype='float16', input_tokens='32', max_new_tokens=32, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', accuracy_only=False, num_beam=1, num_iter=10, num_warmup=3, batch_size=1, token_latency=False, print_memory=False, disable_optimize_transformers=False, woq=False, calib_dataset='wikitext2', calib_group_size=-1, calib_output_dir='./', calib_checkpoint_name='quantized_weight.pt', calib_nsamples=128, calib_wbits=4, calib_seed=0, woq_checkpoint_path='')
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:36<00:00, 1.11s/it]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
python: /home/user/17oct/pti-gpu/tools/oneprof/metric_query_cache.h:69: _zet_metric_query_handle_t* MetricQueryCache::GetQuery(ze_context_handle_t): Assertion `status == ZE_RESULT_SUCCESS' failed.

Basically the issue with "-q" option. Seems to be running fine with "-k" option.
Can you pls check on priority. This is blocking analysis of LLM workloads.

jfedorov pushed a commit that referenced this issue Dec 18, 2023
* adding host synchronize event handling to hot functions sample

* adding host synchronize event handling to hot functions sample
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant