#kallisto-nf-reproduce This repository contains the software, scripts and data to reproduce the RNA-Seq results decribed in the Nextflow publication.
The repository contains two versions of a tradtional bash style pipeline for Mac and Linux (kallisto-mac and kallisto-linux) as well as the Nextflow version of the pipeline compatible across platforms (kallisto-nf).
- Folder
R
: contains theanalysis.R
script for determining the overlapping sets - Folder
kallisto-linux
contains the scripts for running the native (bash), non-Nextflow verion of the pipeline on Linux - Folder
kallisto-mac
contains the scripts for running the native (bash), non-Nextflow verion of the pipeline on Mac OSX - Folder
kallisto-nf
contains the Nextflow version of the pipeline for running on any compatible platform
kallisto-nf exisits as a git submodule within this repository. To clone the repository, including the submodule, one can include the --recursive
flag:
git clone --recursive https://github.com/cbcrg/kallisto-nf-reproduce.git
cd kallisto-nf-reproduce
All data is available from the original sources, as well as a compressed tarball (~22GB).
To download and uncompress the data use the following command:
mkdir data
wget -O- https://zenodo.org/record/159158/files/kallisto_data.tar.gz | tar xz -C data
If you wish to retrieve the data from the original sources, you can find it here:
- Reads: All Illumina HiSeq2000 read data can be downloaded from the NCBI SRA GEO: GSE37703.
- Transcriptome: The transcriptome GRCh38 release 79 (cDNA all) is available from the kallisto website here.
Install Kallisto version 0.42.4.
Install Sleuth
Launch the kallisto bash pipeline script running the following command:
./kallisto-linux/kallisto-std.sh \
data/raw_reads \
data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa \
data/exp_info/hiseq_info.txt \
results-linux
Install Kallisto version 0.42.4.
Install Sleuth
Launch the kallisto bash pipeline script running the following command:
./kallisto-mac/kallisto-std.sh \
data/raw_reads \
data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa \
data/exp_info/hiseq_info.txt \
results-mac
Install Nextflow with the following command:
curl -fsSL get.nextflow.io | bash
Install Docker following the instruction at this page.
Pull the Docker images used for this experiment (optional):
docker pull cbcrg/kallisto-nf@sha256:9f840127392d04c9f8e39cb72bcd62ff53cfe0492dde02dc3749bf15f1c547f1
Once the read data has been downloaded from SRA, it is possible to reproduce the Nextflow version of the pipeline from the kallisto-nf directory using the following command:
nextflow run kallisto-nf/kallisto.nf \
--reads 'data/raw_reads/SRR4933*_{1,2}.fastq' \
--transcriptome data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa \
--experiment data/exp_info/hiseq_info.txt \
--output kallisto-nf-results \
-with-docker