Skip to content

Commit

Permalink
Add MMMU-test results.
Browse files Browse the repository at this point in the history
  • Loading branch information
kentang-mit authored May 4, 2024
1 parent 73bac60 commit 7fc3f55
Showing 1 changed file with 14 additions and 15 deletions.
29 changes: 14 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,21 +27,20 @@ VILA is a visual language model (VLM) pretrained with interleaved image-text dat

### Image QA Benchmarks

| $~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ | Prec. | VQAv2 | GQA | VizWiz | SQA-I | VQA-T | POPE | MME | MMB | MMB-CN | SEED | SEED-I | MMMU | llava-bench | MM-Vet | Average (w/o MME) |
| -------------------------------- | ----- | ----- | ---- | ------ | ----- | ----- | ---- | ------- | ---- | ------ | ---- | ------ | ---- | ----------- | ------ | ----------------- |
| VILA1.5-3B | fp16 | 80.4 | 61.5 | 53.5 | 69.0 | 60.4 | 85.9 | 1442.44 | 63.4 | 52.7 | 60.9 | 67.9 | 33.3 | 75.9 | 35.4 | 61.6 |
| VILA1.5-3B-AWQ | int4 | 80.0 | 61.1 | 53.8 | 67.8 | 60.4 | 85.9 | 1437.34 | 63.3 | 51.4 | 59.8 | 66.6 | 32.7 | 75.0 | 37.3 | 61.2 |
| VILA1.5-3B-S2 | fp16 | 79.8 | 61.4 | 61.3 | 69.6 | 63.4 | 85.3 | 1431.65 | 62.8 | 52.2 | 60.0 | 66.4 | 32.8 | 76.7 | 38.6 | 62.3 |
| VILA1.5-3B-S2-AWQ | int4 | 79.4 | 61.3 | 62.3 | 69.2 | 63.0 | 85.8 | 1417.06 | 61.6 | 51.5 | 59.1 | 65.7 | 33.4 | 77.1 | 36.7 | 62.0 |
| Llama-3-VILA1.5-8B | fp16 | 80.9 | 61.9 | 58.7 | 79.9 | 66.3 | 84.4 | 1577.01 | 72.3 | 66.2 | 64.2 | 71.4 | 36.9 | 80.0 | 38.3 | 66.3 |
| Llama-3-VILA1.5-8B-AWQ | int4 | 80.3 | 61.7 | 59.3 | 79.0 | 65.4 | 82.9 | 1593.65 | 71.0 | 64.9 | 64.0 | 71.1 | 36.0 | 79.0 | 37.2 | 65.5 |
| VILA1.5-13B | fp16 | 82.8 | 64.3 | 62.6 | 80.1 | 65.0 | 86.3 | 1569.55 | 74.9 | 66.3 | 65.1 | 72.6 | 37.9 | 80.8 | 44.3 | 67.9 |
| VILA1.5-13B-AWQ | int4 | 82.7 | 64.5 | 63.3 | 79.7 | 64.7 | 86.7 | 1531.35 | 74.7 | 66.7 | 65.1 | 72.6 | 37.8 | 81.9 | 46.4 | 68.2 |
| VILA1.5-40B | fp16 | 84.3 | 64.6 | 62.2 | 87.2 | 73.6 | 87.3 | 1726.82 | 82.4 | 80.2 | 69.1 | 75.8 | 51.9 | 81.3 | 53.0 | 73.3 |
| VILA1.5-40B-AWQ | int4 | 84.1 | 64.4 | 61.3 | 86.7 | 73.2 | 88.2 | 1714.79 | 83.2 | 79.6 | 68.9 | 75.6 | 49.3 | 83.0 | 51.4 | 73.0 |


<sup>NOTE: VQAV2 and VizWiz are test-dev, for MMMU we report the validation set accuracy.</sup>
| $~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ | Prec. | VQAv2 | GQA | VizWiz | SQA-I | VQA-T | POPE | MME | MMB | MMB-CN | SEED | SEED-I | MMMU (val) | MMMU (test) | llava-bench | MM-Vet | Average |
| -------------------------------- | ----- | ----- | ---- | ------ | ----- | ----- | ---- | ------- | ---- | ------ | ---- | ------ | ---------- | ----------- | ----------- | ------ | ------- |
| VILA1.5-3B | fp16 | 80.4 | 61.5 | 53.5 | 69.0 | 60.4 | 85.9 | 1442.44 | 63.4 | 52.7 | 60.9 | 67.9 | 33.3 | 30.8 | 75.9 | 35.4 | 60.2 |
| VILA1.5-3B-AWQ | int4 | 80.0 | 61.1 | 53.8 | 67.8 | 60.4 | 85.9 | 1437.34 | 63.3 | 51.4 | 59.8 | 66.6 | 32.7 | 31.1 | 75.0 | 37.3 | 59.9 |
| VILA1.5-3B-S2 | fp16 | 79.8 | 61.4 | 61.3 | 69.6 | 63.4 | 85.3 | 1431.65 | 62.8 | 52.2 | 60.0 | 66.4 | 32.8 | 31.3 | 76.7 | 38.6 | 60.9 |
| VILA1.5-3B-S2-AWQ | int4 | 79.4 | 61.3 | 62.3 | 69.2 | 63.0 | 85.8 | 1417.06 | 61.6 | 51.5 | 59.1 | 65.7 | 33.4 | 30.4 | 77.1 | 36.7 | 60.5 |
| Llama-3-VILA1.5-8B | fp16 | 80.9 | 61.9 | 58.7 | 79.9 | 66.3 | 84.4 | 1577.01 | 72.3 | 66.2 | 64.2 | 71.4 | 36.9 | 36.0 | 80.0 | 38.3 | 65.1 |
| Llama-3-VILA1.5-8B-AWQ | int4 | 80.3 | 61.7 | 59.3 | 79.0 | 65.4 | 82.9 | 1593.65 | 71.0 | 64.9 | 64.0 | 71.1 | 36.0 | 36.1 | 79.0 | 37.2 | 64.5 |
| VILA1.5-13B | fp16 | 82.8 | 64.3 | 62.6 | 80.1 | 65.0 | 86.3 | 1569.55 | 74.9 | 66.3 | 65.1 | 72.6 | 37.9 | 33.6 | 80.8 | 44.3 | 66.3 |
| VILA1.5-13B-AWQ | int4 | 82.7 | 64.5 | 63.3 | 79.7 | 64.7 | 86.7 | 1531.35 | 74.7 | 66.7 | 65.1 | 72.6 | 37.8 | 34.0 | 81.9 | 46.4 | 66.5 |
| VILA1.5-40B | fp16 | 84.3 | 64.6 | 62.2 | 87.2 | 73.6 | 87.3 | 1726.82 | 82.4 | 80.2 | 69.1 | 75.8 | 51.9 | 46.9 | 81.3 | 53.0 | 72.4 |
| VILA1.5-40B-AWQ | int4 | 84.1 | 64.4 | 61.3 | 86.7 | 73.2 | 88.2 | 1714.79 | 83.2 | 79.6 | 68.9 | 75.6 | 49.3 | 46.2 | 83.0 | 51.4 | 72.1 |

<sup>NOTE: VQAV2 and VizWiz are test-dev, the average accuracy is calculated over all datasets and MME numbers are divided by 20.</sup>

### Video QA Benchmarks

Expand Down

0 comments on commit 7fc3f55

Please sign in to comment.