Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: DefaultSamplingPipeline - strange behavior at high temperature #928

Open
PioneerMNDR opened this issue Sep 26, 2024 · 2 comments
Open

Comments

@PioneerMNDR
Copy link

PioneerMNDR commented Sep 26, 2024

Description

I decided to try the popular configuration min_p = 0.1 and temp = 1.5 or higher.
I get the following result:

image

I used the example LLama.Examples/Examples/LLama3ChatSession.cs
To show the incorrect behavior.
The only thing I changed
var chatHistory = new ChatHistory();
and

   var inferenceParams = new InferenceParams
  {
      SamplingPipeline = new DefaultSamplingPipeline
      {
          Temperature = 1.5f,
          MinP=0.1f,
                            
      },

      MaxTokens = 100, // keep generating tokens until the anti prompt is encountered
      AntiPrompts = [model.Tokens.EndOfTurnToken!] // model specific end of turn string
  };

In my project I use BatchedExecutor with the correct formatting "Promt template" and Anti promts, and I get exactly the same result. I also changed the sampling order of ProcessTokenDataArray and it did not change anything. I tested it on CUDA and Vulkan. I noticed a pattern in that the first 20-30 tokens are correct, and then chaos begins.

In LM Studio and Kobold CPP I set the temperature even higher, and Min p even lower, but everything worked fine there

Reproduction Steps

  1. Use DefaultSamplingPipeline
  2. Set temperature higher than 1.2
  3. Set min_p = 0.1 or higher

Environment & Configuration

  • Operating system: Win10
  • .NET runtime version: 8.0.4
  • LLamaSharp version: 0.16.0
  • CUDA version (if you are using cuda backend): 12
  • CPU & GPU device: RTX 3050 8gb and i5-12400
  • Model: L3-8B-Stheno-v3.2-Q6_K.gguf

Known Workarounds

No response

@PioneerMNDR PioneerMNDR changed the title [BUG]: DefaultSamplingPipeline - incorrect operation at high temperature [BUG]: DefaultSamplingPipeline - strange behavior at high temperature Sep 26, 2024
@martindevans
Copy link
Member

If possible, could you try adding some breakpoints/logging into the calls here. These are basically the lowest level calls, directly into llama.cpp

In particular looking for two things:

  • Are the values you set actually getting passed through correctly? Just to make sure there's not something overwriting the values you've set.
  • Are the other calls all being made with default values? Maybe try commenting them out just to be extra sure!

On a sidenote, the next version of LLamaSharp will be completely replacing sampling, because there has been a major redesign of the API on the llama.cpp side recently.

@PioneerMNDR
Copy link
Author

These are the values ​​that are passed if they are not defined:
image
For the sake of purity of the experiment, I decided to comment out other samplers:
image

Nothing has changed(1):
image
Nothing has changed(2):
image
It feels like when he gets a high temperature he forgets EOS and starts hallucinating.
I decided to conduct an experiment. If I turn min_p up to 1, the model will always respond the same, regardless of temperature:

ex num pic
Ex1 image
Ex2 image
Ex3 image

The experiment shows that min_p sampler works. And I really don't understand what the problem is. True, I reduced min_p to 0.01, and he still told the joke about the bicycle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants