Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoding parameters (e.g., temperature) for Gemma-2? #64

Open
iseesaw opened this issue Sep 5, 2024 · 4 comments
Open

decoding parameters (e.g., temperature) for Gemma-2? #64

iseesaw opened this issue Sep 5, 2024 · 4 comments

Comments

@iseesaw
Copy link

iseesaw commented Sep 5, 2024

Hello, How should I set the decoding parameters (e.g., temperature) for Gemma-2? My result is about ~50.0, far from the benchmark of 76.

@xiamengzhou
Copy link
Contributor

@MaoXinn
Copy link

MaoXinn commented Sep 15, 2024

Hi, i also met this problem. I only got WR/LC as follows:
54.47204968944099,59.969975205397596

here is my evaluation config:

Gemma-2-Aligned-simpo:
completions_kwargs:
batch_size: 900
max_new_tokens: 4096
model_kwargs:
dtype: bfloat16
model_name: princeton-nlp/gemma-2-9b-it-SimPO
stop_token_ids:
- 1
- 107
temperature: 0.5
top_p: 1.0
fn_completions: vllm_local_completions
pretty_name: gemma-2-9b-it-SimPO
prompt_template: ./eval_config/gemma2_prompt.txt

The only different is that i remove "do_sample: true".

I reviewed your config and your conversation with the AE author on GitHub, and now I’m quite confused.
Even after downgrading AE2 to 0.62, I still couldn’t run it based on the configuration you provided. The main problem seems to lie with beam search. Should I enable beam search? If so, the temperature must be set to 0, but I don’t know what the beam size should be.

Thank you~

@LotuSrc
Copy link

LotuSrc commented Sep 23, 2024

Maybe you use alpaca_eval_gpt4_turbo_fn. In this setting, the result is close to the result you reported.

@xiamengzhou
Copy link
Contributor

@MaoXinn It’s a bit tricky to interpret what happened based on the information you provided. How about we troubleshoot it step by step? You could begin by running the evaluations with the outputs we provided on AlpacaEval and check if you can get a similar score first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants