Inconsistent `prompt_eval_count` for Large Prompts in Ollama Python Library #271

surajyadav91 · 2024-09-06T09:23:37Z

What is the issue?

Inconsistent `prompt_eval_count` for Large Prompts in Ollama Python Library

For larger prompts, when using the Ollama Python library with the llama3.1:8b-instruct-fp16 model, the prompt_eval_count remains constant at fixed value (1026) tokens, even when the input prompt size varies significantly. This behavior is observed when using the ollama.chat() method.

def classify_incident(row):
    full_prompt = (
        prompt_template + 
        row['user_message'] 
    )

    response = ollama.chat(model=model, options={'temperature' : 0.01}, messages=[
            {
            'role': 'user',
            'content': full_prompt
            }
            ])
    total_token = (response['prompt_eval_count'], response['eval_count'], 
                   response['prompt_eval_count'] + response['eval_count'])
    
    print(f'Tokens: {total_token}\n'
          f'Total_prompt_length: {len(full_prompt)}\n'
          f'{"=" * 50}\n')

Sample output:

Tokens: (1026, 15, 1041)
Total_prompt_length: 57788

Tokens: (1026, 20, 1046)
Total_prompt_length: 57172

Tokens: (1026, 18, 1044)
Total_prompt_length: 57744

Current Behavior

prompt_eval_count consistently returns same value (1026), regardless of the actual prompt length.
eval_count (output tokens) varies as expected. (this might also give fixed value once larger text is generated )

Expected Behavior

prompt_eval_count should accurately reflect the number of tokens in the input prompt.
The value should change dynamically based on the input size and content.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.3.9

The text was updated successfully, but these errors were encountered:

rick-github · 2024-09-06T16:56:30Z

This sounds like you've exceeded the context buffer and the value is the number of tokens that were processed in the last slot window. Try adding "num_ctx":60000 to the options in the ollama.chat() call. Note that this will increase the amount of VRAM required and depending on your hardware, may push some of the model off the GPU and in to system RAM for CPU inference.

surajyadav91 · 2024-09-06T18:29:41Z

This sounds like you've exceeded the context buffer and the value is the number of tokens that were processed in the last slot window. Try adding "num_ctx":60000 to the options in the ollama.chat() call. Note that this will increase the amount of VRAM required and depending on your hardware, may push some of the model off the GPU and in to system RAM for CPU inference.

thanks for pointing this out.
I didn't earlier notice this option here https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values
I can see the default is 2048, so why in my case is it maximally reaching till 1026? even if consider option num_predict for which default value is 128, still maximum value should have been more than 1026. Is this explained in detail somewhere, with examples?

surajyadav91 · 2024-09-06T18:33:21Z

Also, by default num_ctx should have been set to model's max context length.

rick-github · 2024-09-06T18:44:17Z

Source.

Context buffer is expensive in VRAM cost which grows quadratically on length. I mentioned pushing layers off to CPU above, if that happens inference speed drops dramatically, so the default value is meant to preserve performance. If the user wants a larger context, it can be extended with num_ctx in the API call or by creating a customized model with PARAMETER num_ctx xxx in the Modelfile.

Flash attention can reduce the VRAM cost, but it doesn't work for all models.

surajyadav91 mentioned this issue Sep 6, 2024

Inconsistent prompt_eval_count for Large Prompts in Ollama Python Library ollama/ollama#6672

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent `prompt_eval_count` for Large Prompts in Ollama Python Library #271

Inconsistent `prompt_eval_count` for Large Prompts in Ollama Python Library #271

surajyadav91 commented Sep 6, 2024

rick-github commented Sep 6, 2024

surajyadav91 commented Sep 6, 2024

surajyadav91 commented Sep 6, 2024

rick-github commented Sep 6, 2024 •

edited

Loading

Inconsistent prompt_eval_count for Large Prompts in Ollama Python Library #271

Inconsistent prompt_eval_count for Large Prompts in Ollama Python Library #271

Comments

surajyadav91 commented Sep 6, 2024

What is the issue?