Results archive The zipped archive contains the following data and subfolders:
- alignment: merged BAM file with index, md5sums and alignment statistics (.Log.final.out)
- expression: textfiles with gene level quantification per sample and per project.
- fastqc: FastQC output
- qcmetrics: Multiple qcMetrics and images generated with Picard-tools or SAMTools Flagstat.
- leafcutter: Leafcutter and RegTools output files.
- expression/Deseq2 differential expression analysis.
- multiqc_data: Combined MultiQC tables used for multiqc report html.
- variants: Variants calls using GATK.
- rawdata: raw sequence file in the form of a gzipped fastq file (.fq.gz)
The root of the results directory contains the final QC report, README.txt, analysis results from each tool, and the samplesheet which formed the basis for this analysis.
- Alexander Dobin 1 , Carrie A Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson, Thomas R Gingeras: STAR: ultrafast universal RNA-seq aligner 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25.
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Subgroup 1000 Genome Project Data Processing: The Sequence Alignment/Map format and SAMtools. Bioinforma 2009, 25 (16):2078–2079.
- Anders S, Pyl PT, Huber W: HTSeq – A Python framework to work with high-throughput sequencing data HTSeq – A Python framework to work with high-throughput sequencing data. 2014:0–5.
- Andrews, S. (2010). FastQC a Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ${samtoolsVersion}
- Picard Sourceforge Web site. http://picard.sourceforge.net/ ${picardVersion}
- The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. McKenna A et al.2010 GENOME RESEARCH 20:1297-303, Version: ${gatkVersion}
- Li YI, Knowles DA, Humphrey J, et al. Annotation-free quantification of RNA splicing using LeafCutter. Nat Genet. 2018;50(1):151-158. doi:10.1038/s41588-017-0004-9
scp –r SEQSTARTDATE_SEQ_RUNTEST_FLOWCELLXX username@yourcluster:${root}/groups/$groupname/${tmpDir}/rawdata/ngs/YOURDIR
mkdir ${root}/groups/$groupname/${tmpDir}/generatedscripts/TestRun
scp –r TestRun.csv username@yourcluster:/groups/$groupname/${tmpDir}/generatedscripts/
Note: the name of the folder should be the same as samplesheet (.csv) file. Note2: Example samplesheet can be found in $EBROOTNGS_RNA/templates/externalSamplesheet.csv
module load NGS_RNA
cd ${root}/groups/$groupname/${tmpDir}/generatedscripts/TestRun
cp $EBROOTNGS_RNA/generate_template.sh .
bash generate_template.sh
cd scripts
Note: If you want to run the pipeline locally, you should change the backend in the CreateInhouseProjects.sh script (this can be done almost at the end of the script where you have something like: sh ${EBROOTMOLGENISMINCOMPUTE}/molgenis_compute.sh search for –b slurm and change it into –b localhost
bash submit.sh
Navigate to jobs folder. The location of the jobs folder will be outputted at the step before this one (step 4).
bash submit.sh