You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(llama-17oct) user@BA-ARCH-LAB-SPR-PVC-2T:~/17oct/frameworks.ai.pytorch.gpu-models/LLM/generation$ /home/user/17oct/pti-gpu/tools/oneprof/build/./oneprof -q -o newlog_llama7b_oneprof_q_O_log.txt -p /home/user/17oct/oneprof_temp/ -s 1000 python -u run_generation.py --device xpu --ipex --dtype float16 --input-tokens 32 --max-new-tokens 32 --num-beam 1 --benchmark -m decapoda-research/llama-7b-hf --sub-model-name llama-7b
Namespace(model_id='decapoda-research/llama-7b-hf', sub_model_name='llama-7b', device='xpu', dtype='float16', input_tokens='32', max_new_tokens=32, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', accuracy_only=False, num_beam=1, num_iter=10, num_warmup=3, batch_size=1, token_latency=False, print_memory=False, disable_optimize_transformers=False, woq=False, calib_dataset='wikitext2', calib_group_size=-1, calib_output_dir='./', calib_checkpoint_name='quantized_weight.pt', calib_nsamples=128, calib_wbits=4, calib_seed=0, woq_checkpoint_path='')
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:36<00:00, 1.11s/it]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
python: /home/user/17oct/pti-gpu/tools/oneprof/metric_query_cache.h:69: _zet_metric_query_handle_t* MetricQueryCache::GetQuery(ze_context_handle_t): Assertion `status == ZE_RESULT_SUCCESS' failed.
Basically the issue with "-q" option. Seems to be running fine with "-k" option.
Can you pls check on priority. This is blocking analysis of LLM workloads.
The text was updated successfully, but these errors were encountered:
(llama-17oct) user@BA-ARCH-LAB-SPR-PVC-2T:~/17oct/frameworks.ai.pytorch.gpu-models/LLM/generation$ /home/user/17oct/pti-gpu/tools/oneprof/build/./oneprof -q -o newlog_llama7b_oneprof_q_O_log.txt -p /home/user/17oct/oneprof_temp/ -s 1000 python -u run_generation.py --device xpu --ipex --dtype float16 --input-tokens 32 --max-new-tokens 32 --num-beam 1 --benchmark -m decapoda-research/llama-7b-hf --sub-model-name llama-7b
Namespace(model_id='decapoda-research/llama-7b-hf', sub_model_name='llama-7b', device='xpu', dtype='float16', input_tokens='32', max_new_tokens=32, prompt=None, greedy=False, ipex=True, jit=False, profile=False, benchmark=True, lambada=False, dataset='lambada', accuracy_only=False, num_beam=1, num_iter=10, num_warmup=3, batch_size=1, token_latency=False, print_memory=False, disable_optimize_transformers=False, woq=False, calib_dataset='wikitext2', calib_group_size=-1, calib_output_dir='./', calib_checkpoint_name='quantized_weight.pt', calib_nsamples=128, calib_wbits=4, calib_seed=0, woq_checkpoint_path='')
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:36<00:00, 1.11s/it]
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565
python: /home/user/17oct/pti-gpu/tools/oneprof/metric_query_cache.h:69: _zet_metric_query_handle_t* MetricQueryCache::GetQuery(ze_context_handle_t): Assertion `status == ZE_RESULT_SUCCESS' failed.
Basically the issue with "-q" option. Seems to be running fine with "-k" option.
Can you pls check on priority. This is blocking analysis of LLM workloads.
The text was updated successfully, but these errors were encountered: