-
Notifications
You must be signed in to change notification settings - Fork 53
Build & Load Data
This tutorial will guide you to build the json documents that should be loaded into the Cellbase knowledgebase. The process may be carried out by using the Cellbase CLI:
prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh build --help
Three main datasets will be built and loaded for the human genome: genome sequence, gene annotation and variant annotation.
Use the Cellbase CLI for building "genome sequence" data, for example:
prompt$ mkdir /tmp/buildTest/
prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh build --build genome-sequence --input /tmp/downloadTest/homo_sapiens/sequence/Homo_sapiens.GRCh37.p13.fa.gz --output /tmp/buildTest/
Note: building process may require up to 2GB of RAM and may take up to ~20 minutes, depending on the hardware. A genome_sequence.json.gz
file should be created:
/tmp/buildTest/genome_sequence.json.gz
Use the Cellbase CLI for building "gene" data, for example:
prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh build --build gene --input /tmp/downloadTest/homo_sapiens/gene/ -o /tmp/test/ --species "Homo sapiens" --reference-genome-file /tmp/homo_sapiens/sequence/Homo_sapiens.GRCh37.fa.gz
Note: building process may take around ~20 minutes, depending on the hardware.
A genome_sequence.json.gz
file should be created:
/tmp/buildTest/gene.json.gz
Use the Cellbase CLI for building "variant" data, for example:
prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh build --build variation --input /tmp/downloadTest/homo_sapiens/variation/ --output /tmp/buildTest/
Note: building process may require up to 13GB of RAM and 30GB of free hard disk space. Building can take up to 5-7 hours, depending on the hardware.
A variation.json.gz
file should be created:
/tmp/buildTest/variation.json.gz