Cannot reproduce the results of Llama7B dora_r32. #14

xiaoshingshing2 · 2024-07-02T08:01:32Z

First of all, using the official checkpoint is ok. The results on BoolQ is 69.63 while the official result is 69.7.

However, when I try to reproduce the results, I encounter two problems.

The first is about Llama7B dora_r32 without dora_simple. I change three commands in llama_7B_Dora.sh. For example, the micro_batch_size from 16 to 4, learning_rate from 2e-4 to 1e-4, and add --dora_simple False to avoid using dora_simple. I use the command line sh llama_7B_Dora.sh 32 64 ./finetuned_result/dora_r32 0, and the results are

BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA	Average
69.3	78.9	78.3	54.3	80.0	82.6	66.1	81.0	73.8

which are worse than the official results.

The second is that when I delete the --dora_simple False to accelerate the training process with dora_simple, the results are even worse:

BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA	Average
32.9	75.5	71.8	9.9	41.3	81.9	66.3	75.8	56.9

The text was updated successfully, but these errors were encountered:

xiaoshingshing2 · 2024-07-02T08:04:54Z

This is the log of the training process and adapter_config with --dora_simple False
trainer_state.json
adapter_config.json

nbasyl · 2024-07-02T16:56:37Z

Did you install all the packages following requirements.txt?

xiaoshingshing2 · 2024-07-03T02:31:59Z

Hi, I donot install bitsandbytes and my pytorch version is 2.1.0. The transformers package is installed with pip install transformers==4.36.0. Other packages are the same as requirements.txt.

Will that hurt the performance?

The packages I use are listed below:

Package Version

accelerate 0.25.0
aiofiles 23.2.1
aiohttp 3.9.5
aiosignal 1.3.1
altair 5.3.0
annotated-types 0.7.0
anyio 4.4.0
appdirs 1.4.4
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.2.0
black 23.12.0
certifi 2024.6.2
charset-normalizer 3.3.2
click 8.1.7
cmake 3.29.6
contourpy 1.2.1
cycler 0.12.1
datasets 2.15.0
decorator 5.1.1
dill 0.3.7
dnspython 2.6.1
email_validator 2.2.0
exceptiongroup 1.2.1
executing 2.0.1
fastapi 0.111.0
fastapi-cli 0.0.4
ffmpy 0.3.2
filelock 3.15.4
fire 0.5.0
fonttools 4.53.0
frozenlist 1.4.1
fsspec 2023.10.0
gradio 4.9.0
gradio_client 0.7.2
h11 0.14.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.23.4
idna 3.7
importlib_resources 6.4.0
ipython 8.25.0
jedi 0.19.1
Jinja2 3.1.4
jsonschema 4.22.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
lit 18.1.8
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.0
matplotlib-inline 0.1.7
mdurl 0.1.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.15
mypy-extensions 1.0.0
networkx 3.3
numpy 1.26.4
orjson 3.10.5
packaging 24.1
pandas 2.2.2
parso 0.8.4
pathspec 0.12.1
pexpect 4.9.0
pillow 10.3.0
pip 24.0
platformdirs 4.2.2
prompt_toolkit 3.0.47
protobuf 5.27.2
psutil 6.0.0
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 16.1.0
pyarrow-hotfix 0.6
pydantic 2.7.4
pydantic_core 2.18.4
pydub 0.25.1
Pygments 2.18.0
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.9
pytorch-triton-rocm 2.1.0
pytz 2024.1
PyYAML 6.0.1
referencing 0.35.1
regex 2024.5.15
requests 2.32.3
rich 13.7.1
rpds-py 0.18.1
safetensors 0.4.3
scipy 1.11.4
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 69.5.1
shellingham 1.5.4
six 1.16.0
sniffio 1.3.1
stack-data 0.6.3
starlette 0.37.2
sympy 1.12.1
termcolor 2.4.0
tokenize-rt 5.2.0
tokenizers 0.15.2
tomli 2.0.1
tomlkit 0.12.0
toolz 0.12.1
torch 2.1.0+rocm5.6
tqdm 4.66.4
traitlets 5.14.3
transformers 4.36.0
typer 0.12.3
typing_extensions 4.12.2
tzdata 2024.1
ujson 5.10.0
urllib3 2.2.2
uvicorn 0.30.1
uvloop 0.19.0
watchfiles 0.22.0
wcwidth 0.2.13
websockets 11.0.3
wheel 0.43.0
xxhash 3.4.1
yarl 1.9.4

I am trying to use exactly the same package as in requirements.txt, and will update my results when the finetuning and testing process finish.

xiaoshingshing2 · 2024-07-04T08:17:32Z

I use exactly the packages in requirements.txt, and the results are:

with dora simple:

BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA	Average
69.1	82.8	78.8	86.2	81.0	82.1	66.1	79.2	78.2

without dora simple:

BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA	Average
68.7	83.3	79.4	85.5	81.3	80.8	66.0	78.8	78.0

which has 0.4% average accuracy gap between the results reported in readme.

xiaoshingshing2 · 2024-07-08T06:25:32Z

New updates:

I use exactly the packages in requirements.txt, and the results on r=[8,16] still have a large gap with the results reported in readme, while the results on r=[4,64] are better and the result on r=[32] is roughly equal.

Average acc:

r	original	reproduce
4	61.9	65.2
8	77.9	72.5
16	77.5	62.7
32	78.4	78.2
64	76.8	77.9

Is this a normal result?

zhanqiqi77 · 2024-08-03T11:32:17Z

@xiaoshingshing2 I have encountered a similar issue. Did you manage to resolve it? Could you provide your package versions?

xiaoshingshing2 · 2024-08-05T02:39:45Z

@xiaoshingshing2 I have encountered a similar issue. Did you manage to resolve it? Could you provide your package versions?

In the latest update, I used exactly the packages in requirements.txt with the same versions. I still have the problems.

xiaoshingshing2 changed the title ~~Cannot reproduce the results of Llama7B.~~ Cannot reproduce the results of Llama7B dora_r32. Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the results of Llama7B dora_r32. #14

Cannot reproduce the results of Llama7B dora_r32. #14

xiaoshingshing2 commented Jul 2, 2024 •

edited

Loading

xiaoshingshing2 commented Jul 2, 2024 •

edited

Loading

nbasyl commented Jul 2, 2024

xiaoshingshing2 commented Jul 3, 2024 •

edited

Loading

xiaoshingshing2 commented Jul 4, 2024

xiaoshingshing2 commented Jul 8, 2024 •

edited

Loading

zhanqiqi77 commented Aug 3, 2024

xiaoshingshing2 commented Aug 5, 2024

Cannot reproduce the results of Llama7B dora_r32. #14

Cannot reproduce the results of Llama7B dora_r32. #14

Comments

xiaoshingshing2 commented Jul 2, 2024 • edited Loading

xiaoshingshing2 commented Jul 2, 2024 • edited Loading

nbasyl commented Jul 2, 2024

xiaoshingshing2 commented Jul 3, 2024 • edited Loading

xiaoshingshing2 commented Jul 4, 2024

xiaoshingshing2 commented Jul 8, 2024 • edited Loading

zhanqiqi77 commented Aug 3, 2024

xiaoshingshing2 commented Aug 5, 2024

xiaoshingshing2 commented Jul 2, 2024 •

edited

Loading

xiaoshingshing2 commented Jul 2, 2024 •

edited

Loading

xiaoshingshing2 commented Jul 3, 2024 •

edited

Loading

xiaoshingshing2 commented Jul 8, 2024 •

edited

Loading