👑 WGS Leukemia Tumor Only Königskobra 🐍

Snakemake workflow to analyse hematological malignancies in whole genome data when only tumor sample is available

💬 Introduction

This snakemake workflow uses modules from hydragenetics to process .fastq files and obtain different kind of variants (SNV, indels, CNV, SV). Alongside diagnosis-filtered .vcf files, the workflow produces a multiqc report .html file and some CNV plots. One of the modules contains the commercial parabricks toolkit which can be replaced by opensource GATK tools if required. The following modules are currently part of this pipeline:

annotation
cnv_sv
compression
misc
parabricks
prealignment
qc

❗ Dependencies

In order to use this module, the following dependencies are required:

🎒 Preparations

Sample and unit data

Input data should be added to samples.tsv and units.tsv. The following information need to be added to these files:

Column Id	Description
`samples.tsv`
sample	unique sample/patient id, one per row
tumor_content	ratio of tumor cells to total cells
`units.tsv`
sample	same sample/patient id as in `samples.tsv`
type	data type identifier (one letter), can be one of Tumor, Normal, RNA
platform	type of sequencing platform, e.g. `NovaSeq`
machine	specific machine id, e.g. NovaSeq instruments have `@Axxxxx`
flowcell	identifer of flowcell used
lane	flowcell lane number
barcode	sequence library barcode/index, connect forward and reverse indices by `+`, e.g. `ATGC+ATGC`
fastq1/2	absolute path to forward and reverse reads
adapter	adapter sequences to be trimmed, separated by comma

Reference data

Reference files should be specified in config.yaml

A .fasta reference file of the human genome is required as well as an .fai file and an bwa index of this file.
A .vcf file containing known indel sites. For GRCh38, this file is available as part of the Broad GATK resource bundle at google cloud.
An .interval_list file containing all whole genome calling regions. The GRCh38 version is also available at google cloud.
The trimmer_software should be specified by indicating a rule which should be used for trimming. This pipeline uses fastp_pe.
.bed files defining regions of interest for different diagnoses. This pipeline is assuming ALL and AML and different gene lists for SNVs and SVs.
For pindel, a .bed file containing the region that the analysis should be limited to.
simple_sv_annotation comes with panel and a fusion pair list which should also be included in the config.yaml.
Annotation with SnpEff a database is needed which can be downloaded through the cli.
For VEP, a cache resource should be downloaded prior to running the workflow.

🚀 Usage

To run the workflow, resources.yaml is needed which defines different resources as default and for different rules. For parabricks, the gres stanza is needed and should specify the number of GPUs available.

snakemake --profile my-profile

Relevant output files

File	Description
`cnv_sv/cnvkit_diagram/{sample}_T.png`	chromosome diagram from cnvkit
`cnv_sv/cnvkit_scatter/{sample}_T_{chromosome}.png`	scatter plot per chromosome from cnvkit
`cnv_sv/cnvkit_vcf/{sample}_T.vcf`	`.vcf` output from cnvkit
`cnv_sv/pindel/{sample}.vcf`	`.vcf` output from pindel
`compression/crumble/{sample}_{type}.crumble.cram`	crumbled `.cram` file
`compression/crumble/{sample}_{type}.crumble.cram.crai`	index for crumbled `.cram` file
`compression/spring/{sample}_{flowcell}_{lane}_{barcode}_{type}.spring`	compressed `.fastq` file pair
`tsv_files/{sample}_mutectcaller_t.aml.tsv`	`.tsv` file for excel containing SNVs from mutect2 for AML
`tsv_files/{sample}_mutectcaller_t.all.tsv`	`.tsv` file for excel containing SNVs from mutect2 for ALL
`tsv_files/{sample}_manta_t.aml.tsv`	`.tsv` file for excel containing SVs from manta for AML
`tsv_files/{sample}_manta_t.all.tsv`	`.tsv` file for excel containing SVs from manta for ALL
`qc/multiqc/multiqc.html`	`.html` report from MultiQC

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github		.github
.tests/integration		.tests/integration
config		config
images		images
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.test.txt		requirements.test.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👑 WGS Leukemia Tumor Only Königskobra 🐍

💬 Introduction

❗ Dependencies

🎒 Preparations

Sample and unit data

Reference data

🚀 Usage

Relevant output files

🧑‍⚖️ Rule Graph

About

Releases

Packages

Contributors 3

Languages

License

clinical-genomics-uppsala/wgs_leukemia_tumor_only_konigskobra

Folders and files

Latest commit

History

Repository files navigation

👑 WGS Leukemia Tumor Only Königskobra 🐍

💬 Introduction

❗ Dependencies

🎒 Preparations

Sample and unit data

Reference data

🚀 Usage

Relevant output files

🧑‍⚖️ Rule Graph

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages