This repository contains the key components to build the DHR ETL pipeline in order to standardise and harmonise multi-modal patient data using a unified data standard.
The various components of the pipeline are as follows;
- Data Model: Contains the conceptual data model definition which extends OMOP CDM v5.4 to inlcude Imaging Occurrence, Imaging features and Methylation entities
- ETL: Contains the Nextflow + DBT (Data Build Tool) pipeline for data profiling and transformation
- Mappings: Contains mapping specifications for the multi-modal data model using RabbitInAHat software. The mappings described use the following data sources;
- Synthea v3.2
- TCGA WSI
- Mehylation (TBD)
For queries, contact [email protected]