Skip to content

Download Data Sources

javild edited this page Feb 20, 2015 · 4 revisions

This tutorial will guide you to download a set of raw files from several data sources. These raw files shall contain the core data that will populate the Cellbase knowledgebase. Download can be done through the Cellbase CLI:

prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh download --help

Three main datasets will be downloaded in for the human genome: genome sequence, gene annotation, variant annotation. By using the download command of the cellbase.sh script, an example of a full command line could be:

`prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh download -o /tmp/downloadTest --sequence --variation --gene --species "Homo sapiens"

Heavy files will be downloaded and therefore the time needed for completion may vary between minutes and even 1 hour. Downloaded data should look like these:

/tmp/downloadTest/
└── homo_sapiens
    ├── gene
    │   ├── gene_extra_info_cellbase.log
    │   └── protein_function_prediction_matrices.log
    ├── sequence
    │   ├── genome_info.log
    │   ├── Homo_sapiens.GRCh37.p13.fa.gz
    │   └── Homo_sapiens.GRCh37.p13.fa.gz.log
    └── variation
        ├── allele_code.txt.gz
        ├── allele_code.txt.gz.log
        ├── allele.txt.gz
        ├── allele.txt.gz.log
        ├── attrib.txt.gz
        ├── attrib.txt.gz.log
        ├── attrib_type.txt.gz
        ├── attrib_type.txt.gz.log
        ├── genotype_code.txt.gz
        ├── genotype_code.txt.gz.log
        ├── motif_feature_variation.txt.gz
        ├── motif_feature_variation.txt.gz.log
        ├── phenotype_feature_attrib.txt.gz
        ├── phenotype_feature_attrib.txt.gz.log
        ├── phenotype_feature.txt.gz
        ├── phenotype_feature.txt.gz.log
        ├── phenotype.txt.gz
        ├── phenotype.txt.gz.log
        ├── population_genotype.txt.gz
        ├── population_genotype.txt.gz.log
        ├── population.txt.gz
        ├── population.txt.gz.log
        ├── seq_region.txt.gz
        ├── seq_region.txt.gz.log
        ├── source.txt.gz
        ├── source.txt.gz.log
        ├── structural_variation_feature.txt.gz
        ├── structural_variation_feature.txt.gz.log
        ├── study.txt.gz
        ├── study.txt.gz.log
        ├── transcript_variation.txt.gz
        ├── transcript_variation.txt.gz.log
        ├── variation_feature.txt.gz
        ├── variation_feature.txt.gz.log
        ├── variation_synonym.txt.gz
        ├── variation_synonym.txt.gz.log
        ├── variation.txt.gz
        └── variation.txt.gz.log

If download was successful, you can proceed to building the json objects that should be loaded into the corresponding database: Building data.