Add MMMU-test results.

NVlabs · May 4, 2024 · 7fc3f55 · 7fc3f55
1 parent 73bac60
commit 7fc3f55
Showing 1 changed file with 14 additions and 15 deletions.
diff --git a/README.md b/README.md
@@ -27,21 +27,20 @@ VILA is a visual language model (VLM) pretrained with interleaved image-text dat
 
 ### Image QA Benchmarks
 
-| $~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ | Prec. | VQAv2 | GQA  | VizWiz | SQA-I | VQA-T | POPE | MME     | MMB  | MMB-CN | SEED | SEED-I | MMMU | llava-bench | MM-Vet | Average (w/o MME) |
-| -------------------------------- | ----- | ----- | ---- | ------ | ----- | ----- | ---- | ------- | ---- | ------ | ---- | ------ | ---- | ----------- | ------ | ----------------- |
-| VILA1.5-3B                     | fp16  | 80.4  | 61.5 | 53.5   | 69.0  | 60.4  | 85.9 | 1442.44 | 63.4 | 52.7   | 60.9 | 67.9   | 33.3 | 75.9        | 35.4   | 61.6              |
-| VILA1.5-3B-AWQ                 | int4  | 80.0  | 61.1 | 53.8   | 67.8  | 60.4  | 85.9 | 1437.34 | 63.3 | 51.4   | 59.8 | 66.6   | 32.7 | 75.0        | 37.3   | 61.2              |
-| VILA1.5-3B-S2                  | fp16  | 79.8  | 61.4 | 61.3   | 69.6  | 63.4  | 85.3 | 1431.65 | 62.8 | 52.2   | 60.0 | 66.4   | 32.8 | 76.7        | 38.6   | 62.3              |
-| VILA1.5-3B-S2-AWQ              | int4  | 79.4  | 61.3 | 62.3   | 69.2  | 63.0  | 85.8 | 1417.06 | 61.6 | 51.5   | 59.1 | 65.7   | 33.4 | 77.1        | 36.7   | 62.0              |
-| Llama-3-VILA1.5-8B               | fp16  | 80.9  | 61.9 | 58.7   | 79.9  | 66.3  | 84.4 | 1577.01 | 72.3 | 66.2   | 64.2 | 71.4   | 36.9 | 80.0        | 38.3   | 66.3              |
-| Llama-3-VILA1.5-8B-AWQ           | int4  | 80.3  | 61.7 | 59.3   | 79.0  | 65.4  | 82.9 | 1593.65 | 71.0 | 64.9   | 64.0 | 71.1   | 36.0 | 79.0        | 37.2   | 65.5              |
-| VILA1.5-13B                      | fp16  | 82.8  | 64.3 | 62.6   | 80.1  | 65.0  | 86.3 | 1569.55 | 74.9 | 66.3   | 65.1 | 72.6   | 37.9 | 80.8        | 44.3   | 67.9              |
-| VILA1.5-13B-AWQ                  | int4  | 82.7  | 64.5 | 63.3   | 79.7  | 64.7  | 86.7 | 1531.35 | 74.7 | 66.7   | 65.1 | 72.6   | 37.8 | 81.9        | 46.4   | 68.2              |
-| VILA1.5-40B                      | fp16  | 84.3  | 64.6 | 62.2   | 87.2  | 73.6  | 87.3 | 1726.82 | 82.4 | 80.2   | 69.1 | 75.8   | 51.9 | 81.3        | 53.0   | 73.3              |
-| VILA1.5-40B-AWQ                  | int4   | 84.1  | 64.4 | 61.3   | 86.7  | 73.2  | 88.2 | 1714.79 | 83.2 | 79.6   | 68.9 | 75.6   | 49.3 | 83.0        | 51.4   | 73.0              |
-
-
-<sup>NOTE: VQAV2 and VizWiz are test-dev, for MMMU we report the validation set accuracy.</sup>
+| $~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ | Prec. | VQAv2 | GQA  | VizWiz | SQA-I | VQA-T | POPE | MME     | MMB  | MMB-CN | SEED | SEED-I | MMMU (val) | MMMU (test) | llava-bench | MM-Vet | Average |
+| -------------------------------- | ----- | ----- | ---- | ------ | ----- | ----- | ---- | ------- | ---- | ------ | ---- | ------ | ---------- | ----------- | ----------- | ------ | ------- |
+| VILA1.5-3B                       | fp16  | 80.4  | 61.5 | 53.5   | 69.0  | 60.4  | 85.9 | 1442.44 | 63.4 | 52.7   | 60.9 | 67.9   | 33.3       | 30.8        | 75.9        | 35.4   | 60.2    |
+| VILA1.5-3B-AWQ                   | int4  | 80.0  | 61.1 | 53.8   | 67.8  | 60.4  | 85.9 | 1437.34 | 63.3 | 51.4   | 59.8 | 66.6   | 32.7       | 31.1        | 75.0        | 37.3   | 59.9    |
+| VILA1.5-3B-S2                    | fp16  | 79.8  | 61.4 | 61.3   | 69.6  | 63.4  | 85.3 | 1431.65 | 62.8 | 52.2   | 60.0 | 66.4   | 32.8       | 31.3        | 76.7        | 38.6   | 60.9    |
+| VILA1.5-3B-S2-AWQ                | int4  | 79.4  | 61.3 | 62.3   | 69.2  | 63.0  | 85.8 | 1417.06 | 61.6 | 51.5   | 59.1 | 65.7   | 33.4       | 30.4        | 77.1        | 36.7   | 60.5    |
+| Llama-3-VILA1.5-8B               | fp16  | 80.9  | 61.9 | 58.7   | 79.9  | 66.3  | 84.4 | 1577.01 | 72.3 | 66.2   | 64.2 | 71.4   | 36.9       | 36.0        | 80.0        | 38.3   | 65.1    |
+| Llama-3-VILA1.5-8B-AWQ           | int4  | 80.3  | 61.7 | 59.3   | 79.0  | 65.4  | 82.9 | 1593.65 | 71.0 | 64.9   | 64.0 | 71.1   | 36.0       | 36.1        | 79.0        | 37.2   | 64.5    |
+| VILA1.5-13B                      | fp16  | 82.8  | 64.3 | 62.6   | 80.1  | 65.0  | 86.3 | 1569.55 | 74.9 | 66.3   | 65.1 | 72.6   | 37.9       | 33.6        | 80.8        | 44.3   | 66.3    |
+| VILA1.5-13B-AWQ                  | int4  | 82.7  | 64.5 | 63.3   | 79.7  | 64.7  | 86.7 | 1531.35 | 74.7 | 66.7   | 65.1 | 72.6   | 37.8       | 34.0        | 81.9        | 46.4   | 66.5    |
+| VILA1.5-40B                      | fp16  | 84.3  | 64.6 | 62.2   | 87.2  | 73.6  | 87.3 | 1726.82 | 82.4 | 80.2   | 69.1 | 75.8   | 51.9       | 46.9        | 81.3        | 53.0   | 72.4    |
+| VILA1.5-40B-AWQ                  | int4  | 84.1  | 64.4 | 61.3   | 86.7  | 73.2  | 88.2 | 1714.79 | 83.2 | 79.6   | 68.9 | 75.6   | 49.3       | 46.2        | 83.0        | 51.4   | 72.1    |
+
+<sup>NOTE: VQAV2 and VizWiz are test-dev, the average accuracy is calculated over all datasets and MME numbers are divided by 20.</sup>
 
 ### Video QA Benchmarks