Skip to content

Build & Load Data

javild edited this page Feb 20, 2015 · 1 revision

This tutorial will guide you to build the json documents that should be loaded into the Cellbase knowledgebase. The process may be carried out by using the Cellbase CLI:

prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh build --help

Three main datasets will be built and loaded for the human genome: genome sequence, gene annotation and variant annotation.

Use the Cellbase CLI for building "genome sequence" data, for example:

prompt$ mkdir /tmp/buildTest/
prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh build --build genome-sequence --input /tmp/downloadTest/homo_sapiens/sequence/Homo_sapiens.GRCh37.p13.fa.gz --output /tmp/buildTest/

Note: building process may require up to 2GB of RAM and may take up to ~20 minutes, depending on the hardware. A genome_sequence.json.gz file should be created:

/tmp/buildTest/genome_sequence.json.gz

Use the Cellbase CLI for building "gene" data, for example:

prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh build --build gene --input /tmp/downloadTest/homo_sapiens/gene/ -o /tmp/test/ --species "Homo sapiens" --reference-genome-file /tmp/homo_sapiens/sequence/Homo_sapiens.GRCh37.fa.gz

Note: building process may take around ~20 minutes, depending on the hardware. A genome_sequence.json.gz file should be created:

/tmp/buildTest/gene.json.gz

Use the Cellbase CLI for building "variant" data, for example:

prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh build --build variation --input /tmp/downloadTest/homo_sapiens/variation/ --output /tmp/buildTest/

Note: building process may require up to 13GB of RAM and 30GB of free hard disk space. Building can take up to 5-7 hours, depending on the hardware. A variation.json.gz file should be created: /tmp/buildTest/variation.json.gz