This is the official implementation for our ICCV-2023 paper
"HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models"
Eslam Abdelrahman, Pengzhan Sun, Xiaoqian Shen, Faizan Farooq Khan, Li Erran Li, and Mohamed Elhoseiny
- (July 13, 2023): The paper is accepted at ICCV-2023.
- (April 11, 2023): The paper is published on arxiv.
- Holistic skills evaluation. Rather than focus on isolated metrics such as accuracy, we measure 13 skills, which could be categorized into five critical skills; accuracy, robustness, generalization, fairness, and bias.
- Broad scenarios coverage. HRS-Bench covers 50 applications, e.g., fashion, animals, transportation, food, and clothes.
- Standardization. We propose a unified benchmark, where we fairly evaluate the existing models across a wide range of metrics.
- Holistic prompts generation.
- Stable-Diffusion V1
- Stable-Diffusion V2
- DALL.E V2
- Structure-Difussion
- CogView V2
- Glide
- Paella
- minDALL-E
- DALLEMini
- Python >= 3.7
- Pytorch >= 1.7.0
- Install other common packages (numpy, pytorch_transformers, etc.)
- First, download our prompts that covers the 13 skills from here.
- Each skill has its own CSV file that contains the prompt and the GT that will be used during the evaluation phase.
You don't need to run the prompts generation codes as we already provide the generated prompts and can be downloaded from this link.
However, we provide also all the generation codes.
Follow the detailed instructions mentioned in the README file. to be able to run all our eval scripts for the whole skills.
The project is inspired from the great language benchmark HELM.
Please consider citing our paper if you find it useful.
@misc{2304.05390,
Author = {Eslam Mohamed Bakr and Pengzhan Sun and Xiaoqian Shen and Faizan Farooq Khan and Li Erran Li and Mohamed Elhoseiny},
Title = {HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models},
Year = {2023},
Eprint = {arXiv:2304.05390},
}