Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper provided example can not be reproduced !! #153

Open
HuangChiEn opened this issue Oct 15, 2024 · 6 comments
Open

Paper provided example can not be reproduced !! #153

HuangChiEn opened this issue Oct 15, 2024 · 6 comments

Comments

@HuangChiEn
Copy link

HuangChiEn commented Oct 15, 2024

I have draw the example from the README.md.
image

  1. load-8bit response is acceptable, but it didn't give me any explanation.
    image

  2. I think load-8-bit may decrease performance, so i exec in fp16 mode only (no serious quantization). But i got the worse results.. it still doesn't explain.
    image
    image

bunch of out-of-control [SEG] token is pop out?

About version and package :

accelerate                1.0.1
aiofiles                  23.2.1
aiohappyeyeballs          2.4.3
aiohttp                   3.10.10
aiosignal                 1.3.1
altair                    5.4.1
annotated-types           0.7.0
anyio                     4.6.2
async-timeout             4.0.3
attrs                     24.2.0
autocommand               2.2.2
backports.tarfile         1.2.0
bitsandbytes              0.41.1
certifi                   2024.8.30
charset-normalizer        3.4.0
click                     8.1.7
contourpy                 1.3.0
cycler                    0.12.1
deepspeed                 0.15.2
einops                    0.4.1
exceptiongroup            1.2.2
fastapi                   0.100.1
ffmpy                     0.4.0
filelock                  3.16.1
flash_attn                2.6.3
fonttools                 4.54.1
frozenlist                1.4.1
fsspec                    2024.9.0
gradio                    3.39.0
gradio_client             1.3.0
grpcio                    1.66.2
h11                       0.14.0
hjson                     3.1.0
httpcore                  1.0.6
httpx                     0.27.2
huggingface-hub           0.25.2
idna                      3.10
importlib_metadata        8.0.0
importlib_resources       6.4.5
inflect                   7.3.1
jaraco.collections        5.1.0
jaraco.context            5.3.0
jaraco.functools          4.0.1
jaraco.text               3.12.1
Jinja2                    3.1.4
joblib                    1.4.2
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
kiwisolver                1.4.7
linkify-it-py             2.0.3
markdown-it-py            2.2.0
markdown2                 2.4.10
MarkupSafe                2.1.5
matplotlib                3.9.2
mdit-py-plugins           0.3.3
mdurl                     0.1.2
more-itertools            10.3.0
mpmath                    1.3.0
msgpack                   1.1.0
multidict                 6.1.0
narwhals                  1.9.3
networkx                  3.2.1
ninja                     1.11.1.1
numpy                     1.24.2
nvidia-ml-py              12.560.30
openai                    0.27.8
opencv-python             4.8.0.74
orjson                    3.10.7
packaging                 24.1
pandas                    2.2.3
peft                      0.4.0
Pillow                    9.4.0
pip                       24.2
platformdirs              4.2.2
propcache                 0.2.0
protobuf                  5.28.2
psutil                    6.0.0
py-cpuinfo                9.0.0
pycocotools               2.0.6
pydantic                  2.9.2
pydantic_core             2.23.4
pydub                     0.25.1
pyparsing                 3.2.0
python-dateutil           2.9.0.post0
python-multipart          0.0.12
pytz                      2024.2
PyYAML                    6.0.2
ray                       2.6.1
referencing               0.35.1
regex                     2024.9.11
requests                  2.31.0
rpds-py                   0.20.0
sacremoses                0.1.1
safetensors               0.4.5
scipy                     1.11.2
semantic-version          2.10.0
sentencepiece             0.2.0
setuptools                75.1.0
shortuuid                 1.0.11
six                       1.16.0
sniffio                   1.3.1
starlette                 0.27.0
sympy                     1.12
tokenizers                0.15.2
tomli                     2.0.1
torch                     2.1.2+cu121
torchaudio                2.1.2+cu121
torchvision               0.16.2+cu121
tqdm                      4.64.1
transformers              4.35.2
triton                    2.1.0
typeguard                 4.3.0
typing_extensions         4.12.2
tzdata                    2024.2
uc-micro-py               1.0.3
urllib3                   2.2.3
uvicorn                   0.23.2
websockets                11.0.3
wheel                     0.44.0
yarl                      1.15.2
zipp                      3.20.2

I have encountered several issue, so i follow transformer version to Llava, and modify the code according to this issue
haotian-liu/LLaVA#968

The real problem i afraid affect the decoding strategy is that
salesforce/LAVIS#571.

So, i have replace all private function (i.e. _expand_mask) to object to pass the static check of python.
Moreover, i have place RuntimeError to the begin of all function which will use it (but i didn't get any RuntimeError).
So, that's mean all private function will not be used during inference.


Any suggestion will be appreciated!!

@HuangChiEn HuangChiEn changed the title Less performance then paper described Paper provided example doesn't work ? Oct 15, 2024
@HuangChiEn HuangChiEn changed the title Paper provided example doesn't work ? Paper provided example can not be reproduced !! Oct 15, 2024
@jifeng35
Copy link

jifeng35 commented Oct 22, 2024

Did not u see the train data.
The GT for LLaVA to output is trained to be "Sure, it's the <seg>."
It was trained to to say like that, so you cannot force it to output the explaination.
Although the writer's demo seem to be wrong.

@HuangChiEn
Copy link
Author

HuangChiEn commented Oct 22, 2024

Did not u see the train data. The GT for LLaVA to output is trained to be "Sure, it's the ." It was trained to to say like that, so you cannot force it to output the explaination. Although the writer's demo seem to be wrong.

yeah, i agree your point, model barely can not generate the phrase what it never seen in training set.
However, the demo examples not just give one example.
image
image

So, i wonder how to reproduce such inference results by adjusting the prompt (we just see it trigger the prompt by 'explain why') ?


On the other hand, the reproduced results also appears the error-prone output, for example, it generate massive [SEG] tokens in console. It's also one of my question.
image

@jifeng35
Copy link

我觉得可以不用纠结这个问题,如果你希望得到demo中的效果,可以考虑调研一下LISA++,LISA++可以很好的完成demo中的任务,对话更自然一些

@HuangChiEn
Copy link
Author

我觉得可以不用纠结这个问题,如果你希望得到demo中的效果,可以考虑调研一下LISA++,LISA++可以很好的完成demo中的任务,对话更自然一些

有查到你提的那篇論文,但我找不到它的github;你提到LISA++可以很好的完成demo中的任务,那你知道在哪取得LISA++的github嗎 (source code和reproduce的權重) ?

@jifeng35
Copy link

好像确实没有代码和权重,LISA++的论文里也没提到

@HuangChiEn
Copy link
Author

HuangChiEn commented Oct 23, 2024

then, let's wait and see that will LISA author give any comment ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants