-
Notifications
You must be signed in to change notification settings - Fork 53
Download Data Sources
This tutorial will guide you to download a set of raw files from several data sources. These raw files shall contain the core data that will populate the Cellbase knowledgebase. Download can be done through the Cellbase CLI:
prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh download --help
Three main datasets will be downloaded in for the human genome: genome sequence, gene annotation, variant annotation. By using the download
command of the cellbase.sh
script, an example of a full command line could be:
`prompt$ /tmp/cellbase/cellbase-app/build/bin/cellbase.sh download -o /tmp/downloadTest --sequence --variation --gene --species "Homo sapiens"
Heavy files will be downloaded and therefore the time needed for completion may vary between minutes and even 1 hour. Downloaded data should look like these:
/tmp/downloadTest/
└── homo_sapiens
├── gene
│ ├── gene_extra_info_cellbase.log
│ └── protein_function_prediction_matrices.log
├── sequence
│ ├── genome_info.log
│ ├── Homo_sapiens.GRCh37.p13.fa.gz
│ └── Homo_sapiens.GRCh37.p13.fa.gz.log
└── variation
├── allele_code.txt.gz
├── allele_code.txt.gz.log
├── allele.txt.gz
├── allele.txt.gz.log
├── attrib.txt.gz
├── attrib.txt.gz.log
├── attrib_type.txt.gz
├── attrib_type.txt.gz.log
├── genotype_code.txt.gz
├── genotype_code.txt.gz.log
├── motif_feature_variation.txt.gz
├── motif_feature_variation.txt.gz.log
├── phenotype_feature_attrib.txt.gz
├── phenotype_feature_attrib.txt.gz.log
├── phenotype_feature.txt.gz
├── phenotype_feature.txt.gz.log
├── phenotype.txt.gz
├── phenotype.txt.gz.log
├── population_genotype.txt.gz
├── population_genotype.txt.gz.log
├── population.txt.gz
├── population.txt.gz.log
├── seq_region.txt.gz
├── seq_region.txt.gz.log
├── source.txt.gz
├── source.txt.gz.log
├── structural_variation_feature.txt.gz
├── structural_variation_feature.txt.gz.log
├── study.txt.gz
├── study.txt.gz.log
├── transcript_variation.txt.gz
├── transcript_variation.txt.gz.log
├── variation_feature.txt.gz
├── variation_feature.txt.gz.log
├── variation_synonym.txt.gz
├── variation_synonym.txt.gz.log
├── variation.txt.gz
└── variation.txt.gz.log
If download was successful, you can proceed to building the json objects that should be loaded into the corresponding database: Build & Load Data.