In order to use this module, the following dependencies are required:
Input data should be added to samples.tsv
and units.tsv
.
The following information need to be added to these files:
Column Id | Description |
---|---|
samples.tsv |
|
sample | unique sample/patient id, one per row |
units.tsv |
|
sample | same sample/patient id as in samples.tsv |
type | data type identifier (one letter), can be one of Tumor, Normal, RNA |
platform | type of sequencing platform, e.g. NovaSeq |
machine | specific machine id, e.g. NovaSeq instruments have @Axxxxx |
flowcell | identifer of flowcell used |
lane | flowcell lane number |
barcode | sequence library barcode/index, connect forward and reverse indices by + , e.g. ATGC+ATGC |
fastq1/2 | absolute path to forward and reverse reads |
adapter | adapter sequences to be trimmed, separated by comma |
The workflow repository contains a small test dataset .tests/integration
which can be run like so:
$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --use-singularity
To use this run this pipeline the requirements in requirements.txt
must be installed. It is most straightforward to install the requirements inside a python virtual environment created with the python venv module. The sample.tsv
, units.tsv
, resources.yaml
, and config.yaml
files need to be available in the config directory (or otherwise specified in config.yaml
). You always need to specify the config
-file either in the profile yaml file or in the snakemake command. To run the pipeline:
Running the pipeline on CPU:
module load slurm-drmaa
module load singularity/3.11.0
python3.9 -m venv venv
source venv/bin/acfivate
pip install -r requirements.txt
pipeline_path=/path/to/pipeline
snakemake --profile ${pipeline_path}/profiles/slurm/ -s ${pipeline_path}/workflow/Snakefile --prioritize prealignment_fastp_pe \
-p --configfile config/config.yaml --config aligner=bwa_cpu snp_caller=deepvariant_cpu
To create a reference for exomedepth based on the samples in the samples_ref.tsv and units_ref.tsv a config_reference.yaml must be specified in the command:
snakemake --profile ${pipeline_path}/profiles/slurm/ -s ${pipeline_path}/workflow/Snakefile --prioritize prealignment_fastp_pe \
-p --configfiles config/config.yaml config/config_reference.yaml --config aligner=bwa_cpu snp_caller=deepvariant_cpu --notemp -n
This pipeline is created to run on Illumina whole genome sequence data to call germline variants.
The workflow repository contains a dry run test of the pipeline in .tests/integration
which can be run like so:
$ cd .tests/integration
$ snakemake -n -s ../../workflow/Snakefile --configfile config/config.yaml