Data description of the Emory Knee Radiograph (MRKR) Dataset hosted by Open Data on AWS.
An example notebook is included.
The Emory Knee Radiograph (MRKR) dataset is a large, demographically diverse collection of
503,261 knee radiographs from 83,011 patients, 40% of which are African American. This dataset
provides imaging data in DICOM format along with detailed clinical information, including patient-
reported pain scores, diagnostic codes, and procedural codes, which are not commonly available in
similar datasets. The MRKR dataset also features imaging metadata such as image laterality, view type,
and presence of hardware, enhancing its value for research and model development. MRKR addresses
significant gaps in existing datasets by offering a more representative sample for studying osteoarthritis
and related outcomes, particularly among minority populations, thereby providing a valuable resource
for clinicians and researchers.
The total dataset is 2.3 TB and includes DICOMs (2.3 TB) and seven CSV files (2.7 GB) each containing clinical and metadata.
The dataset will be hosted with Open Data on AWS.
Will be available soon
Total Patients | 83,011 (100%) |
Gender | |
Female | 51,175 (61.6%) |
Male | 31,836 (38.4%) |
Age, years | |
Mean | 56.6 (std. +/- 16.6) |
Median | 58 |
Race demographics | |
White | 36,927 (44.5%) |
Black | 33,503 (40.4%) |
Asian | 2,893 (3.5%) |
Unknown/Unreported | 8,751 (10.5%) |
Other | 937 (1.1%) |
Ethnicity | |
Hispanic | 2,501 (3.0%) |
Non-Hispanic | 66,378 (80.0%) |
Unknown/Unreported | 14,132 (17.0%) |
Clinical outcomes | |
Arthroplasty | 14,843 (17.9%) |
Filename: MRKR_CPT.csv File size: 178 MB Total rows: 6,216,190 Description: This table contains information regarding all CPT codes for a patient and corresponding dates. |
||
Field Name | Data Type | Description |
---|---|---|
empi_anon | Integer (8 digits) | Unique patient identification number (83,011 patients) |
cpt_code | String (5 characters) | Current Procedural Terminology code used in coding of medical services and procedures for billing (7,166 CPT codes) |
cpt_group_modifier | String | Used to provide further information regarding service or procedure. Most CPT codes do not include modifier data. If there is modifier data, it is often used to indicate the laterality of a procedure (left or right). There can be multiple modifiers for a single CPT code entry. |
date_anon | Date | Date of when the associated procedure or service occurred. |
age_at_procedure | Integer | Age when the procedure was performed. |
Filename: MRKR_CPT_dictionary.csv File size: 754 KB Total rows: 7,166 Description: A lookup table between CPT codes and corresponding descriptions. |
||
Field Name | Data Type | Description |
---|---|---|
cpt_code | String (5 characters) | Current Procedural Terminology code used in the coding of medical services and procedures for billing. |
cpt_description | String | Description of the procedure. There are some unique CPT codes that share the same description. |
Filename: MRKR_ICD.csv File size: 1.7 GB Total rows: 21,956,056 Description: ICD9 and ICD10 diagnoses for patients with corresponding dates. Certain diseases of interest are indicated by binary flags to ease data cleaning. |
||
Field Name | Data Type | Description |
---|---|---|
empi_anon | Integer (8 digits) | Unique patient identification number (83,011 unique patients) |
ICD9 | String | International Classification of Diseases - 9 (12,418 unique codes) |
ICD10 | String | International Classification of Diseases - 10 (26,963 unique codes) |
date_anon | Date | Date of when the diagnosis code was entered. |
age_at_dx | Integer | Age when the diagnosis was recorded. |
DX_LINE | String | Primary, Secondary, Active, Not Recorded, Resolved, Canceled, Inactive. |
DX_ICD_SCOPE | String | Billing Diagnosis, Discharge Diagnosis, Admitting Diagnosis, Referring Diagnosis, Not Recorded, Reason For Visit, Problem List, Working Diagnosis, Other Diagnosis, Final, Pre-Op Diagnosis, Post-Op Diagnosis, Principal Diagnosis, Suggested Billing. |
autoimmune | Binary | If ICD code corresponds to auto-immune disease such as rheumatoid arthritis, juvenile arthritis, gout, etc. |
diabetes | Binary | If ICD code corresponds to type I or type II diabetes. |
hypertension | Binary | If ICD code corresponds to hypertension. |
joint_infection | Binary | If ICD code corresponds to a knee joint infection. |
knee_osteoarthritis | Binary | If ICD code corresponds to knee osteoarthritis. |
knee_osteomyelitis | Binary | If ICD code corresponds to knee osteomyelitis. |
obesity | Binary | If ICD code corresponds to obesity. |
nicotine_use | Binary | If ICD code corresponds to nicotine dependence. |
trauma_lower_extremity | Binary | If ICD code corresponds to lower extremity trauma. |
Filename: MRKR_ICD_dictionary.csv File size: 1.9 MB Total rows: 25,209 Description: Lookup table for ICD9 (International Classification of Diseases) and ICD10 codes and corresponding descriptions. |
||
Field Name | Data Type | Description |
---|---|---|
ICD9 | String | ICD9 code. |
ICD10 | String | ICD10 code. |
DX_NAME | String | Diagnosis name or description. |
Filename: MRKR_pain.csv File size: 137 MB Total rows: 4,975,933 Description: Contains information on self-reported pain scores by patients during any encounter, including outpatient, emergency, and perioperative. Pain scores related to knees are curated. |
||
Field Name | Data Type | Description |
---|---|---|
empi_anon | Integer (8 digits) | Unique patient identification number (83,011 unique patients) |
pain_location | String | Raw, uncurated strings of pain locations entered by staff. Approximately 75% of entries are blank. |
knee_pain | Binary | Curated using regular expressions to identify if the pain_location is definitely knee related. |
pain_score | Integer | 0 - 10 pain score. |
Filename: MRKR_demographics.csv File size: 4.5 MB Total rows: 83,011 Description: Patient demographics, indexed at the patient level. |
||
Field Name | Data Type | Description |
---|---|---|
empi_anon | Integer (8 digits) | Unique patient identification number. |
sex | Nominal string | [male, female] - Patient sex. |
race | Nominal string | [African American or Black, American Indian or Alaskan Native, Asian, Caucasian or White, Multiple, Native Hawaiian or Other Pacific Islander, Unknown] - Patient self-reported race. |
ethnicity | Nominal string | [Hispanic patients, Non-Hispanic patients, Unknown, Unreported] - Patient reported ethnicity. |
Filename: MRKR_image_metadata.csv File size: 210 MB Total rows: 503,261 Description: Contains relevant public DICOM metadata tags that may be helpful for identifying images. Patient and exam identifiers are replaced with de-identified versions in this table and within DICOM files. Other Non-PHI containing metadata tags that are not in this table are left intact within DICOM files. Fields containing PHI such as patient name, addresses, or referring physician are removed from this table and DICOM files. For data curation, the below fields were modified or added. |
||
Field Name | Data Type | Description |
---|---|---|
empi_anon | Integer (8 digits) | De-identified patient identification number. |
StudyInstanceUID_anon | String | De-identified Study UID, shared between all images in the same study. |
SeriesInstanceUID_anon | String | De-identified Series UID, shared between all images in the same series. |
SOPInstanceUID_anon | String | De-identified SOP Instance UID which corresponds to a single DICOM image. |
img_height | Integer | Image pixel height. |
img_width | Integer | Image pixel width. |
laterality | Nominal string | [R: Right, L: Left, B: Bilateral, -1: Unknown or not present] - Laterality of the image, as inferred by DL model. |
view_position | Nominal string | [F: Frontal, L: Lateral, S: Sunrise, I: Internal Oblique, E: External Oblique] - Anatomical projection of radiograph, as inferred by DL model. |
horizontal_flip | Binary | Indicates if the patient’s left side was oriented to the left side of the image, which is opposite of typical radiographic orientation, as inferred by DL model. |
weight_bearing | Binary | Indicates if the radiograph was weight-bearing as indicated by a marker and derived by DL model. Not all images in a given exam will be weight-bearing or non-weightbearing. |
inverted | Binary | Indicates whether pixel intensity values are inverted from typical radiographic convention, as inferred by DL model. |
arthroplasty | Nominal string | [R: right, L: left, B: bilateral, NL: unknown (no laterality marker), NaN: no arthroplasty] - Indicates if image contains a knee arthroplasty and its laterality, as derived by DL model. |
L_KLG_inference | Integer [0,1,2,3,4,NaN] |
KLG score of left knee in a bilateral knee radiograph, inferred by DL model. |
R_KLG_inference | Integer [0,1,2,3,4,NaN] |
KLG score of right knee in a bilateral knee radiograph, inferred by DL model. |
SeriesDescription | String | DICOM Metadata describing the series. |
StudyDescription | String | DICOM metadata describing the study. |
StudyDate_anon | Date | De-identified date of radiograph. |
age_at_exam | Integer | Age of the patient when the radiograph was performed. |
dicom_path | String | Path to DICOM file. |