OkVQA Evaluation #40

piyushkhanna00705 · 2023-11-28T06:09:54Z

Thanks for the great work! I love how interpretable ViperGPT is! I am trying to evaluate the results on the OkVQA dataset, but I am facing a similar issue as Issue #24 , wherein the model generates the full answer instead of the specific (1-word) answer required for evaluating it as correct for the exact-match accuracy. I also tried being a bit "lenient" in calculating the accuracy by marking the prediction as correct if the answer word existed in the models' full-sentence predictions, however I still got an accuracy less than that reported in the paper.

Here are evaluation metrics from my experiments:
Exact-Match Accuracy (Wrong answer if prediction does not exactly match the answer): 9.435%
"Lenient" Accuracy (Correct answer if the answer word exists in the model's full length prediction): 21.62%

I am using GPT-3.5 for code generation and blip2-flan-t5-xl for visual queries. Could using blip2-flan-t5-xl instead of blip2-flan-t5-xxl resulted in such a high drop in accuracy, as I would have expected the "Lenient" Accuracy to be at least higher than the one reported in the paper as it may miscount a few answers as correct even though they aren't?

surisdi · 2023-12-22T16:33:03Z

Hi, we updated the code with the evaluation code. Additionally, drop in performance can be expected if blip xl is used instead of xxl, and also if GPT-3.5 is used instead of Codex (which we used in our experiments). We did not run the experiments with GPT-3.5, so we do not have numbers about how much it affects not using Codex, but qualitatively GPT-3.5, is not as good (maybe it is just a matter of prompt engineering, as GPT-3.5 is not code-specific).

But I would suggest using our evaluation code in order to reduce the number of differences with respect to our experiments, so that we can narrow down the number of differences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OkVQA Evaluation #40

OkVQA Evaluation #40

piyushkhanna00705 commented Nov 28, 2023 •

edited

Loading

surisdi commented Dec 22, 2023

OkVQA Evaluation #40

OkVQA Evaluation #40

Comments

piyushkhanna00705 commented Nov 28, 2023 • edited Loading

surisdi commented Dec 22, 2023

piyushkhanna00705 commented Nov 28, 2023 •

edited

Loading