Skip to content

clinical-genomics-uppsala/hastings_rd_wes

Repository files navigation

🐍 hydra-genetics/hastings_rd_wes

Whole exomes sequencing hg38 hydra pipeline for rare diseases

Lint Snakefmt snakemake dry run integration test

pycodestyle pytest

License: GPL-3

💬 Introduction

❗ Dependencies

In order to use this module, the following dependencies are required:

hydra-genetics pandas [python snakemake singularity

🎒 Preparations

Sample data

Input data should be added to samples.tsv and units.tsv. The following information need to be added to these files:

Column Id Description
samples.tsv
sample unique sample/patient id, one per row
units.tsv
sample same sample/patient id as in samples.tsv
type data type identifier (one letter), can be one of Tumor, Normal, RNA
platform type of sequencing platform, e.g. NovaSeq
machine specific machine id, e.g. NovaSeq instruments have @Axxxxx
flowcell identifer of flowcell used
lane flowcell lane number
barcode sequence library barcode/index, connect forward and reverse indices by +, e.g. ATGC+ATGC
fastq1/2 absolute path to forward and reverse reads
adapter adapter sequences to be trimmed, separated by comma

✅ Testing

The workflow repository contains a small test dataset .tests/integration which can be run like so:

$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --use-singularity

🚀 Usage

To use this run this pipeline the requirements in requirements.txt must be installed. It is most straightforward to install the requirements inside a python virtual environment created with the python venv module. The sample.tsv, units.tsv, resources.yaml, and config.yaml files need to be available in the config directory (or otherwise specified in config.yaml). You always need to specify the config-file either in the profile yaml file or in the snakemake command. To run the pipeline:

Running the pipeline on CPU:

module load slurm-drmaa
module load singularity/3.11.0

python3.9 -m venv venv
source venv/bin/acfivate
pip install -r requirements.txt

pipeline_path=/path/to/pipeline

snakemake  --profile ${pipeline_path}/profiles/slurm/ -s ${pipeline_path}/workflow/Snakefile --prioritize prealignment_fastp_pe \
 -p  --configfile config/config.yaml --config aligner=bwa_cpu snp_caller=deepvariant_cpu

To create a reference for exomedepth based on the samples in the samples_ref.tsv and units_ref.tsv a config_reference.yaml must be specified in the command:

snakemake  --profile ${pipeline_path}/profiles/slurm/ -s ${pipeline_path}/workflow/Snakefile --prioritize prealignment_fastp_pe \
 -p  --configfiles config/config.yaml config/config_reference.yaml --config aligner=bwa_cpu snp_caller=deepvariant_cpu --notemp -n

💬 Introduction

This pipeline is created to run on Illumina whole genome sequence data to call germline variants.

✅ Testing

The workflow repository contains a dry run test of the pipeline in .tests/integration which can be run like so:

$ cd .tests/integration
$ snakemake -n -s ../../workflow/Snakefile --configfile config/config.yaml