Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare jcommonsense qa prompts with question first vs last #113

Open
wants to merge 6 commits into
base: jp-stable
Choose a base branch
from

Conversation

kumapo
Copy link

@kumapo kumapo commented Nov 4, 2023

As reported by this article, jcommonsense qa prompts that puts question last results in better performance.
And, as you see the results in following table, I reproduced the performance jump with the prompts by changing only the order of question.

But currently, 0.3 and 0.6 prompts put the question last, and the others put it first.
To ensure a fair model comparison, prompts should have the question in the same position.

What do you think if we add prompts that put the question last or update the current prompts to have the question last?
If I missed something to experiment, please let me know.

Model Acc of Question First (Prompt Ver) Acc of Question Last (Prompt Ver)
japanese-stablelm-base-alpha-7b 0.5728 (v0.2.1) 0.7954 (v0.2.2)
open-calm-3b 0.3128 (v0.2.1) 0.7453 (v0.2.2)
ELYZA-japanese-Llama-2-7b 0.7516 (v0.2.1) 0.7730 (v0.2.2)
llama2-7b-chat 0.5952 (v0.3.2) 0.5559 (v0.3)
japanese-stablelm-instruct-alpha-7b 0.5898 (v0.3.2) 0.8222 (v0.3)
rinna-japanese-gpt-neox-3.6b-instruction-ppo 0.4406 (v0.4) 0.5934 (v0.4.2)
rinna-bilingual-gpt-neox-4b-instruction-ppo 0.4879 (v0.5) 0.5237 (v0.5.2)
llama2-7b-chat 0.6667 (v0.6.2) 0.613 (v0.6)

@kumapo kumapo marked this pull request as ready for review November 5, 2023 09:00
@kumapo kumapo requested a review from jon-tow as a code owner November 5, 2023 09:00
@kumapo kumapo changed the title Compare question first vs last Compare jcommonsense qa prompts with question first vs last Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant