Downloading the Data

MIMIC-CXR

Obtain access to the MIMIC-CXR-JPG Database Database on PhysioNet and download the dataset. We recommend downloading from the GCP bucket:

gcloud auth login
mkdir MIMIC-CXR-JPG
gsutil -m rsync -d -r gs://mimic-cxr-jpg-2.0.0.physionet.org MIMIC-CXR-JPG

In order to obtain gender information for each patient, you will need to obtain access to MIMIC-IV. Download core/patients.csv.gz and core/admissions.csv.gz and place the files in the MIMIC-CXR-JPG directory.

Sign up with your email address here.
Download either the original or the downsampled dataset (we recommend the downsampled version - CheXpert-v1.0-small.zip) and extract it.
Register for an account and download the CheXpert demographics data here.

In cxr_fairness/data/Constants.py, update image_paths to point to the two directories that you downloaded, and CXP_details to be the path to the CheXpert demographics file.
Run python -m cxr_fairness.data.preprocess.preprocess.
(Optional) If you are training a lot of models, it might be faster to cache all images to binary 224x224 files on disk. This is especially true if you are using non-downsized versions of the datasets. In this case, you should update the cache_dir path in cxr_fairness/data/Constants.py and then run python -m cxr_fairness.data.preprocess.cache_data, optionally parallelizing over --env_id {0, 1} for speed. To use the cached files, pass --use_cache to train.py.