GrEBI (Graphs@EBI)

HPC pipeline to integrate knowledge graphs from EMBL-EBI resources, the MONARCH Initiative KG, ROBOKOP, Ubergraph, and other sources into giant (multi-terabyte) materialised, clique merged Neo4j+Solr+RocksDB databases.

Datasource	Loaded from
IMPC	EBI
GWAS Catalog	EBI
OLS	EBI
OpenTargets	EBI
Metabolights	EBI
ChEMBL	EBI
Reactome	EBI, MONARCH
BGee	MONARCH
BioGrid	MONARCH
Gene Ontology (GO) Annotation Database	MONARCH
HGNC (HUGO Gene Nomenclature Committee)	MONARCH
Human Phenotype Ontology Annotations (HPOA)	MONARCH
NCBI Gene	MONARCH
PHENIO	MONARCH
PomBase	MONARCH
ZFIN	MONARCH
MedGen	MONARCH
Protein ANalysis THrough Evolutionary Relationships (PANTHER)	MONARCH, ROBOKOP
STRING	MONARCH, ROBOKOP
Comparative Toxicogenomics Database (CTD)	MONARCH, ROBOKOP
Alliance of Genome Resources	MONARCH, ROBOKOP
BINDING	ROBOKOP
CAM KG	ROBOKOP
The Comparative Toxicogenomics Database (CTD)	ROBOKOP
Drug Central	ROBOKOP
The Alliance of Genome Resources	ROBOKOP
The Genotype-Tissue Expression (GTEx) portal	ROBOKOP
Guide to Pharmacology database (GtoPdb)	ROBOKOP
Hetionet	ROBOKOP
HMDB	ROBOKOP
Human GOA	ROBOKOP
Integrated Clinical and Environmental Exposures Service (ICEES) KG	ROBOKOP
IntAct	ROBOKOP
Protein ANalysis THrough Evolutionary Relationships (PANTHER)	ROBOKOP
Pharos	ROBOKOP
STRING	ROBOKOP
Text Mining Provider KG	ROBOKOP
Viral Proteome	ROBOKOP
AOPWiki	AOPWikiRDF
Ubergraph
MeSH
Human Reference Atlas KG

The resulting graphs can be downloaded from https://ftp.ebi.ac.uk/pub/databases/spot/kg/ebi/

Implementation

The pipeline is implemented as Rust programs with simple CLIs, orchestrated with Nextflow.

The primary output the pipeline is a property graph for Neo4j. The input format (after ingests to extract from KGX, RDF, and bespoke DB formats) is simple JSONL files, to which "bruteforce" integration is applied:

All strings that begin with any IRI or CURIE prefix from the Bioregistry are canonicalised to the standard CURIE form
All property values that are the identifier of another node in the graph become edges
Cliques of equivalent nodes are merged into single nodes
Cliques of equivalent properties are merged into single properties (and for ontology-defined properties, the qualified safe labels are used)

In addition to Neo4j, the nodes and edges are loaded into Solr for full-text search and RocksDB for id->object resolution.

Name		Name	Last commit message	Last commit date
Latest commit History 238 Commits
.github/workflows		.github/workflows
00_fetch_data		00_fetch_data
01_ingest		01_ingest
02_assign_ids		02_assign_ids
03_merge/grebi_merge		03_merge/grebi_merge
04_index/grebi_index		04_index/grebi_index
05_materialise		05_materialise
06_prepare_db_import		06_prepare_db_import
07_create_db		07_create_db
08_run_queries		08_run_queries
configs		configs
docker_envs		docker_envs
grebi_api		grebi_api
grebi_resolver_service		grebi_resolver_service
grebi_shared		grebi_shared
grebi_summary_service		grebi_summary_service
grebi_ui		grebi_ui
k8chart		k8chart
nextflow		nextflow
notebooks		notebooks
prefix_maps		prefix_maps
queries		queries
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile.dataload		Dockerfile.dataload
README.md		README.md
build.rs		build.rs
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GrEBI (Graphs@EBI)

Implementation

About

Releases

Packages

Contributors 2

Languages

EBISPOT/GrEBI

Folders and files

Latest commit

History

Repository files navigation

GrEBI (Graphs@EBI)

Implementation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages