Paper provided example can not be reproduced !! #153

HuangChiEn · 2024-10-15T05:50:44Z

I have draw the example from the README.md.

load-8bit response is acceptable, but it didn't give me any explanation.
I think load-8-bit may decrease performance, so i exec in fp16 mode only (no serious quantization). But i got the worse results.. it still doesn't explain.

bunch of out-of-control [SEG] token is pop out?

About version and package :

accelerate                1.0.1
aiofiles                  23.2.1
aiohappyeyeballs          2.4.3
aiohttp                   3.10.10
aiosignal                 1.3.1
altair                    5.4.1
annotated-types           0.7.0
anyio                     4.6.2
async-timeout             4.0.3
attrs                     24.2.0
autocommand               2.2.2
backports.tarfile         1.2.0
bitsandbytes              0.41.1
certifi                   2024.8.30
charset-normalizer        3.4.0
click                     8.1.7
contourpy                 1.3.0
cycler                    0.12.1
deepspeed                 0.15.2
einops                    0.4.1
exceptiongroup            1.2.2
fastapi                   0.100.1
ffmpy                     0.4.0
filelock                  3.16.1
flash_attn                2.6.3
fonttools                 4.54.1
frozenlist                1.4.1
fsspec                    2024.9.0
gradio                    3.39.0
gradio_client             1.3.0
grpcio                    1.66.2
h11                       0.14.0
hjson                     3.1.0
httpcore                  1.0.6
httpx                     0.27.2
huggingface-hub           0.25.2
idna                      3.10
importlib_metadata        8.0.0
importlib_resources       6.4.5
inflect                   7.3.1
jaraco.collections        5.1.0
jaraco.context            5.3.0
jaraco.functools          4.0.1
jaraco.text               3.12.1
Jinja2                    3.1.4
joblib                    1.4.2
jsonschema                4.23.0
jsonschema-specifications 2024.10.1
kiwisolver                1.4.7
linkify-it-py             2.0.3
markdown-it-py            2.2.0
markdown2                 2.4.10
MarkupSafe                2.1.5
matplotlib                3.9.2
mdit-py-plugins           0.3.3
mdurl                     0.1.2
more-itertools            10.3.0
mpmath                    1.3.0
msgpack                   1.1.0
multidict                 6.1.0
narwhals                  1.9.3
networkx                  3.2.1
ninja                     1.11.1.1
numpy                     1.24.2
nvidia-ml-py              12.560.30
openai                    0.27.8
opencv-python             4.8.0.74
orjson                    3.10.7
packaging                 24.1
pandas                    2.2.3
peft                      0.4.0
Pillow                    9.4.0
pip                       24.2
platformdirs              4.2.2
propcache                 0.2.0
protobuf                  5.28.2
psutil                    6.0.0
py-cpuinfo                9.0.0
pycocotools               2.0.6
pydantic                  2.9.2
pydantic_core             2.23.4
pydub                     0.25.1
pyparsing                 3.2.0
python-dateutil           2.9.0.post0
python-multipart          0.0.12
pytz                      2024.2
PyYAML                    6.0.2
ray                       2.6.1
referencing               0.35.1
regex                     2024.9.11
requests                  2.31.0
rpds-py                   0.20.0
sacremoses                0.1.1
safetensors               0.4.5
scipy                     1.11.2
semantic-version          2.10.0
sentencepiece             0.2.0
setuptools                75.1.0
shortuuid                 1.0.11
six                       1.16.0
sniffio                   1.3.1
starlette                 0.27.0
sympy                     1.12
tokenizers                0.15.2
tomli                     2.0.1
torch                     2.1.2+cu121
torchaudio                2.1.2+cu121
torchvision               0.16.2+cu121
tqdm                      4.64.1
transformers              4.35.2
triton                    2.1.0
typeguard                 4.3.0
typing_extensions         4.12.2
tzdata                    2024.2
uc-micro-py               1.0.3
urllib3                   2.2.3
uvicorn                   0.23.2
websockets                11.0.3
wheel                     0.44.0
yarl                      1.15.2
zipp                      3.20.2

I have encountered several issue, so i follow transformer version to Llava, and modify the code according to this issue
haotian-liu/LLaVA#968

The real problem i afraid affect the decoding strategy is that
salesforce/LAVIS#571.

So, i have replace all private function (i.e. _expand_mask) to object to pass the static check of python.
Moreover, i have place RuntimeError to the begin of all function which will use it (but i didn't get any RuntimeError).
So, that's mean all private function will not be used during inference.

Any suggestion will be appreciated!!

The text was updated successfully, but these errors were encountered:

jifeng35 · 2024-10-22T01:20:08Z

Did not u see the train data.
The GT for LLaVA to output is trained to be "Sure, it's the <seg>."
It was trained to to say like that, so you cannot force it to output the explaination.
Although the writer's demo seem to be wrong.

HuangChiEn · 2024-10-22T07:56:36Z

Did not u see the train data. The GT for LLaVA to output is trained to be "Sure, it's the ." It was trained to to say like that, so you cannot force it to output the explaination. Although the writer's demo seem to be wrong.

yeah, i agree your point, model barely can not generate the phrase what it never seen in training set.
However, the demo examples not just give one example.

So, i wonder how to reproduce such inference results by adjusting the prompt (we just see it trigger the prompt by 'explain why') ?

On the other hand, the reproduced results also appears the error-prone output, for example, it generate massive [SEG] tokens in console. It's also one of my question.

jifeng35 · 2024-10-22T13:14:13Z

我觉得可以不用纠结这个问题，如果你希望得到demo中的效果，可以考虑调研一下LISA++，LISA++可以很好的完成demo中的任务，对话更自然一些

HuangChiEn · 2024-10-23T00:23:58Z

我觉得可以不用纠结这个问题，如果你希望得到demo中的效果，可以考虑调研一下LISA++，LISA++可以很好的完成demo中的任务，对话更自然一些

有查到你提的那篇論文，但我找不到它的github；你提到LISA++可以很好的完成demo中的任务，那你知道在哪取得LISA++的github嗎 (source code和reproduce的權重) ?

jifeng35 · 2024-10-23T00:32:16Z

好像确实没有代码和权重，LISA++的论文里也没提到

HuangChiEn · 2024-10-23T03:28:41Z

then, let's wait and see that will LISA author give any comment ~

HuangChiEn changed the title ~~Less performance then paper described~~ Paper provided example doesn't work ? Oct 15, 2024

HuangChiEn changed the title ~~Paper provided example doesn't work ?~~ Paper provided example can not be reproduced !! Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paper provided example can not be reproduced !! #153

Paper provided example can not be reproduced !! #153

HuangChiEn commented Oct 15, 2024 •

edited

Loading

jifeng35 commented Oct 22, 2024 •

edited

Loading

HuangChiEn commented Oct 22, 2024 •

edited

Loading

jifeng35 commented Oct 22, 2024

HuangChiEn commented Oct 23, 2024

jifeng35 commented Oct 23, 2024

HuangChiEn commented Oct 23, 2024 •

edited

Loading

Paper provided example can not be reproduced !! #153

Paper provided example can not be reproduced !! #153

Comments

HuangChiEn commented Oct 15, 2024 • edited Loading

jifeng35 commented Oct 22, 2024 • edited Loading

HuangChiEn commented Oct 22, 2024 • edited Loading

jifeng35 commented Oct 22, 2024

HuangChiEn commented Oct 23, 2024

jifeng35 commented Oct 23, 2024

HuangChiEn commented Oct 23, 2024 • edited Loading

HuangChiEn commented Oct 15, 2024 •

edited

Loading

jifeng35 commented Oct 22, 2024 •

edited

Loading

HuangChiEn commented Oct 22, 2024 •

edited

Loading

HuangChiEn commented Oct 23, 2024 •

edited

Loading