Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pharmcat update 2.12.0 #38

Merged
merged 5 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,6 @@ create_pgx_samplesheet.sh
# Other files
bin/report_template.txt
bin/snakemake_report.py
bin/pdf.py
bin/pdf.py

subworkflows/local/pharmacoGenomics.nf.backup
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# v2.0.0
- Major change to the process flow
- Updated pharmact to v2.12.0
- Updated ReadMe.md

# v1.1.1
- Updated QC text in the report

Expand Down
46 changes: 26 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<hr>

[![Nextflow DSL2](https://img.shields.io/badge/NextFlow_DSL2-23.04.0-23.svg)](https://www.nextflow.io/docs/latest/dsl2.html) [![Singularity Version](https://img.shields.io/badge/Singularity-%E2%89%A53.8.0-orange)](https://sylabs.io/docs/) [![Run with Singularity](https://img.shields.io/badge/Run%20with-Singularity-orange)](https://sylabs.io/docs/)
[![Nextflow DSL2](https://img.shields.io/badge/NextFlow_DSL2-23.04.0-23.svg)](https://www.nextflow.io/docs/latest/dsl2.html) [![Singularity Version](https://img.shields.io/badge/Singularity-%E2%89%A53.8.0-orange)](https://sylabs.io/docs/) [![PharmCat Version](https://img.shields.io/badge/PhamCat-2.12.0-green)](https://sylabs.io/docs/) [![Run with Singularity](https://img.shields.io/badge/Run%20with-Singularity-orange)](https://sylabs.io/docs/)

[![PharmGKB](https://img.shields.io/badge/PharmGKB-blue)](https://www.pharmgkb.org/) [![CPIC](https://img.shields.io/badge/CPIC-green)](https://cpicpgx.org/) [![PharmVar](https://img.shields.io/badge/PharmVar-yellow)](https://www.pharmvar.org/)
[![PharmCAT](https://img.shields.io/badge/Support_for-PharmCAT-orange)](https://pharmcat.org/)
Expand All @@ -13,7 +13,8 @@

Welcome to PGxModule: Revolutionizing Genomic Medicine!

PGxModule is an advanced Nextflow DSL2 workflow, designed to seamlessly integrate into your genomics pipeline. It empowers you to generate sample-specific reports with clinical guidelines, leveraging state-of-the-art variant detection in Genomic Medicine Sweden sequencing panels. This workflow is inspired by JoelAAs.
PGxModule is an advanced Nextflow DSL2 workflow, designed to seamlessly integrate into your genomics pipeline. It empowers you to generate sample-specific reports with clinical guidelines, leveraging state-of-the-art variant detection in Genomic Medicine Sweden sequencing panels. This workflow is inspired by JoelAAs. Besides, we have also implemented [Pharmcat](https://pharmcat.org/) Pharmacogenomics Clinical Annotation Tool
report with this pipeline, where we get recommendataion for all the detected haplotyopes directly from the CPIC.

### Key Features:

Expand All @@ -24,7 +25,7 @@ PGxModule is an advanced Nextflow DSL2 workflow, designed to seamlessly integrat

## Pipeline Summary

The pipeline focuses on 19 SNPs from TPMT, DPYD, and NUDT15 genes, with plans to incorporate additional genes in future updates. The target selection is meticulously curated from reputable databases such as [PharmGKB](https://www.pharmgkb.org/) and [PharmVar](https://www.pharmvar.org/), guided by [CPIC](https://cpicpgx.org/) recommendations. As the pipeline evolves, it aims to broaden its scope, providing a more comprehensive analysis of pharmacogenomic variations to enhance clinical insights.
This pipeline branches into two analysis, one which only focuses on 19 SNPs from TPMT, DPYD, and NUDT15 genes, with plans to incorporate additional genes in future updates. The second part of the analysis is by using an external tool [Pharmcat](https://pharmcat.org/) developed by [PharmGKB](https://www.pharmgkb.org/) where we try to find as many haplotypes as we can without subsetting the original bam, these haplotypes are then annotated and reported along with the clinical recommendations. The target selection is meticulously curated from reputable databases such as [PharmGKB](https://www.pharmgkb.org/) and [PharmVar](https://www.pharmvar.org/), guided by [CPIC](https://cpicpgx.org/) recommendations. As the pipeline evolves, it aims to broaden its scope, providing a more comprehensive analysis of pharmacogenomic variations to enhance clinical insights.


## Pipeline Steps
Expand All @@ -33,30 +34,35 @@ The PGxModule pipeline was executed with Nextflow version 23.04.2. The pipeline

1. **CSV Validation**
The CSV Validation step ensures the correctness and integrity of the input CSV file. It checks for proper formatting, required fields, and data consistency, providing a foundation for accurate downstream processing in the PGxModule pipeline.
2. **Getting Ontarget Bam**
This step involves extracting the on-target BAM files from the analyzed samples. These BAM files specifically capture the sequencing data aligned to the regions of interest, enabling reduction in time and focused analysis on the genomic regions relevant to the pharmacogenomic study.
3. **Haplotype Calling**
2. **Haplotype Calling**
Haplotype Calling is a crucial stage where the pipeline identifies and assembles haplotypes from the sequencing data. This process is fundamental in characterizing the genetic variations present in the samples, laying the groundwork for subsequent analyses and variant interpretation.
4. **Haplotype Annotation**
Haplotypes which are called are annotated with dbSNP ids.
5. **Haplotype Filtration**
3. **Haplotype Filtration**
Haplotype Filtration focuses on refining the set of identified haplotypes, applying specific criteria to select variants of interest and discard noise. This process enhances the precision of the haplotype dataset, ensuring that downstream analyses are based on high-quality and clinically relevant variants.
6. **Coverage Analysis**
Coverage Analysis evaluates the sequencing depth across targeted regions, providing insights into the reliability of variant calls. By assessing coverage, this step identifies regions with insufficient data and informs the overall confidence in the accuracy of the genomic information obtained from the samples.
7. **Detection of variants**
4. **PharmCat Preprocessing**
A script to preprocess VCF files for PharmCAT, ensuring compliance with VCF v4.2, stripping irrelevant PGx positions, normalizing variants, and optionally filtering sample data.
5. **PharmCat**
This scripts helps us to match the vcf positions with the pharmaco positions, runs the phenotypes and then finally the pharmcat report with all the recommendations.
6. **Ontarget VCF**
This step involves extracting the on-target VCF posiitons from the analyzed samples.
7. **Haplotype Annotation**
Haplotypes which are called are annotated with dbSNP ids.
8. **Detection of variants**
Checking the variants of interest in the whole set of haplotypes and are used for futher analysis
8. **Clinial Recommendations**
9. **Clinial Recommendations**
Identified haplotypes are annotated with Haplotype Ids, clincial reccomendations, interaction guidelines based on CPIC.
9. **Report**
10. **Getting Ontarget Bam**
This step involves extracting the on-target BAM files from the analyzed samples. These BAM files specifically capture the sequencing data aligned to the regions of interest, enabling reduction in time and focused analysis on the genomic regions relevant to the pharmacogenomic study.
11. **Coverage Analysis**
Coverage Analysis evaluates the sequencing depth across targeted regions, providing insights into the reliability of variant calls. By assessing coverage, this step identifies regions with insufficient data and informs the overall confidence in the accuracy of the genomic information obtained from the samples.
12. **Report**
The Report step consolidates the findings from the preceding analyses into a comprehensive report. This report includes detailed information on detected variants, clinical guidelines, interaction assessments, and other relevant pharmacogenomic insights. It serves as a valuable resource for clinicians and researchers, aiding in informed decision-making based on the genomic characteristics of the analyzed samples.

## Example Input CSV

| clarity_sample_id | id | type | assay | group | bam | bai | purity |
|-------------------|---------|------|-----------|---------|---------------------------------------|---------------------------------------|--------|
| CMD123456 | Sample1 | T | solid-pgx | Sample1 | Sample1.T.bwa.umi.sort.bam | Sample1.T.bwa.umi.sort.bam.bai | 0.30 |
| CMD987654 | Sample2 | T | solid-pgx | Sample2 | Sample2.T.bwa.umi.sort.bam | Sample2.T.bwa.umi.sort.bam.bai | 0.30 |

| clarity_sample_id | id | type | assay | group | bam | bai | purity |
|-------------------|---------|------|-------------|---------|--------------------------------------|--------------------------------------|--------|
| XXX000001 | Sample1 | T | gmssolidpgx | Sample1 | Sample1.T.bwa.umi.sort.bam | Sample1.T.bwa.umi.sort.bam.bai | 0.30 |
| XXX000002 | Sample2 | T | gmssolidpgx | Sample2 | Sample2.T.bwa.umi.sort.bam | Sample2.T.bwa.umi.sort.bam.bai | 0.30 |


## Setup
Expand Down Expand Up @@ -111,5 +117,5 @@ nextflow run main.nf --csv /path/to/csv/input.csv -profile "panel,hg38,solid" --

## Workflow Image

<img src="resources/workflow_images/PGx.png" alt="Workflow Image" width="50%">
<img src="resources/workflow_images/PGXModule_Workflow_v2.0.0.png" alt="Workflow Image" width="50%">

80 changes: 48 additions & 32 deletions configs/modules/pharmacogenomics.config
Original file line number Diff line number Diff line change
Expand Up @@ -47,19 +47,6 @@ process {
ext.args = { " --target_bed=${params.pgx_target_regions} --padding=${params.padding} --addchr=${params.addchr} "}
}

withName: '.*PHARMACO_GENOMICS:ONTARGET_BAM' {
container = "${params.container_dir}/samtools.simg"

publishDir = [
path: "${params.outdir}/${params.subdir}/bam/",
mode: 'copy',
overwrite: true,
pattern: "*.bam*"
]

ext.args = { " -h -M " }
ext.prefix = { "${meta.group}" }
}

withName: '.*PHARMACO_GENOMICS:GATK_HAPLOTYPING' {
container = "${params.container_dir}/gatk4.simg"
Expand All @@ -77,7 +64,7 @@ process {
}

withName: '.*PHARMACO_GENOMICS:SENTIEON_HAPLOTYPING' {
container = "${params.container_dir}/sentieon_202112.sif"
container = "${params.container_dir}/sentieon_202308.01.sif"

publishDir = [
path: "${params.outdir}/${params.subdir}/vcf/sentieon",
Expand All @@ -92,36 +79,22 @@ process {
ext.when = { params.haplotype_caller == 'SENTIEON' }
}

withName: '.*PHARMACO_GENOMICS:BCFTOOLS_ANNOTATION' {
container = "${params.container_dir}/bcftools.sif"

publishDir = [
path: "${params.outdir}/${params.subdir}/vcf/",
mode: 'copy',
overwrite: true,
pattern: "*.vcf"
]

ext.args = { "-a ${params.dbSNP} -c ID" }
ext.prefix = { "${meta.group}" }
}

withName: '.*PHARMACO_GENOMICS:VARIANT_FILTRATION' {
container = "${params.container_dir}/target_variants_python.simg"
container = "${params.container_dir}/target_variants_python.sif"

publishDir = [
path: "${params.outdir}/${params.subdir}/vcf/",
mode: 'copy',
overwrite: true,
pattern: "*.vcf"
pattern: "*.filtered.haplotypes.vcf.gz*"
]

ext.args = { " --read_ratio=${params.read_ratio} --depth=${params.read_depth} "}
ext.prefix = { "${meta.group}" }
}

withName: '.*PHARMACO_GENOMICS:PHARMCAT_PREPROCESSING' {
container = "${params.container_dir}/pharmcat_2.9.0.sif"
container = "${params.container_dir}/pharmcat_2.12.0.sif"
containerOptions = ' --contain '

publishDir = [
Expand All @@ -136,7 +109,7 @@ process {
}

withName: '.*PHARMACO_GENOMICS:PHARMCAT' {
container = "${params.container_dir}/pharmcat_2.9.0.sif"
container = "${params.container_dir}/pharmcat_2.12.0.sif"
containerOptions = ' --contain '

publishDir = [
Expand All @@ -156,8 +129,37 @@ process {

ext.prefix = { "${meta.group}" }
ext.when = params.pharmcat
ext.args = {" --reporter-sources CPIC --matcher-save-html --reporter-save-json -ma -re --reporter-title ${meta.group} "}
}

withName: '.*PHARMACO_GENOMICS:ONTARGET_VCF' {
container = "${params.container_dir}/bcftools_1.20.sif"

publishDir = [
path: "${params.outdir}/${params.subdir}/vcf/",
mode: 'copy',
overwrite: true,
pattern: "*.ontarget.*.vcf.gz*"
]

ext.prefix = { "${meta.group}" }
}

withName: '.*PHARMACO_GENOMICS:BCFTOOLS_ANNOTATION' {
container = "${params.container_dir}/bcftools_1.20.sif"

publishDir = [
path: "${params.outdir}/${params.subdir}/vcf/",
mode: 'copy',
overwrite: true,
pattern: "*.filtered.ontarget.haplotypes.anno.vcf"
]

ext.args = { "-a ${params.dbSNP} -c ID" }
ext.prefix = { "${meta.group}" }
}


withName: '.*PHARMACO_GENOMICS:DETECTED_VARIANTS' {
container = "${params.container_dir}/target_variants_python.simg"

Expand Down Expand Up @@ -186,6 +188,20 @@ process {
ext.prefix = { "${meta.group}" }
}

withName: '.*PHARMACO_GENOMICS:ONTARGET_BAM' {
container = "${params.container_dir}/samtools.simg"

publishDir = [
path: "${params.outdir}/${params.subdir}/bam/",
mode: 'copy',
overwrite: true,
pattern: "*ontarget*.bam*"
]

ext.args = { " -h -M " }
ext.prefix = { "${meta.group}" }
}

withName: '.*PHARMACO_GENOMICS:DEPTH_OF_TARGETS' {
container = "${params.container_dir}/gatk3.simg"

Expand Down
2 changes: 1 addition & 1 deletion envs/get_cointainers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ sudo singularity build jinja_report.sif recepies/jinja_report
sudo singularity build samtools.sif recepies/samtools
sudo singularity build gatk3.sif docker://broadinstitute/gatk3:3.8-1
sudo singularity build gatk4.sif docker://broadinstitute/gatk
sudo singularity build bcftools.sif docker://staphb/bcftools
sudo singularity build bcftools_1.20.sif recepies/bcftools
sudo singularity build pharmcat.sif docker://pgkb/pharmcat


Expand Down
8 changes: 8 additions & 0 deletions envs/recepies/bcftools
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Bootstrap: docker
From: staphb/bcftools:1.20

%post
export DEBIAN_FRONTEND=noninteractive

apt-get -y update
apt-get -y install tabix
5 changes: 3 additions & 2 deletions envs/recepies/get_target_variants
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ From: ubuntu:18.04
apt-get -y install zlib1g-dev
apt-get -y install libbz2-dev
apt-get -y install liblzma-dev
apt-get -y install tabix
apt-get -y install python3-pysam
pip3 install pandas==1.0.3
pip3 install argparse==1.4.0
pip3 install cython
pip3 install pysam==0.15.4
pip3 install cython
1 change: 1 addition & 0 deletions envs/recepies/samtools
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ From: ubuntu:18.04

apt-get -y update
apt-get -y install samtools
apt-get -y install tabix
4 changes: 2 additions & 2 deletions modules/local/annotation/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ process BCFTOOLS_ANNOTATION {
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.group}"
"""
bcftools annotate --threads ${task.cpus} $args -o ${prefix}".haplotypes.anno.vcf" $vcf
bcftools annotate --threads ${task.cpus} $args -o ${prefix}".filtered.ontarget.haplotypes.anno.vcf" $vcf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand All @@ -28,7 +28,7 @@ process BCFTOOLS_ANNOTATION {
stub:
def prefix = task.ext.prefix ?: "${meta.group}"
"""
touch ${prefix}".haplotypes.anno.vcf"
touch ${prefix}".filtered.ontarget.haplotypes.anno.vcf"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
Expand Down
20 changes: 14 additions & 6 deletions modules/local/filtration/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ process VARIANT_FILTRATION {
tag "$meta.group"

input:
tuple val(group), val(meta), file(vcf)
tuple val(group), val(meta), file(vcf), file(tbi)

output:
tuple val(group), val(meta), file("*.filtered.vcf"), emit: haplotypes_filtered
path "versions.yml", emit: versions
tuple val(group), val(meta), file("*.filtered.haplotypes.vcf.gz"), file("*.filtered.haplotypes.vcf.gz.tbi"), emit: haplotypes_filtered
path "versions.yml", emit: versions

when:
task.ext.when == null || task.ext.when
Expand All @@ -17,25 +17,33 @@ process VARIANT_FILTRATION {
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.group}"
"""
gunzip -c $vcf > ${prefix}.haplotypes.vcf
variant_filtration.py \
--input_vcf=$vcf \
--input_vcf=${prefix}.haplotypes.vcf \
$args \
--output_file=${prefix}.haplotypes.anno.filtered.vcf
--output_file=${prefix}.filtered.haplotypes.vcf

bgzip -c ${prefix}.filtered.haplotypes.vcf > ${prefix}.filtered.haplotypes.vcf.gz
tabix -p vcf ${prefix}.filtered.haplotypes.vcf.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python3 --version 2>&1 | sed -e 's/Python //g')
bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g')
tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//')
END_VERSIONS
"""

stub:
def prefix = task.ext.prefix ?: "${meta.group}"
"""
touch ${prefix}.haplotypes.anno.filtered.vcf
touch ${prefix}.filtered.haplotypes.vcf.gz ${prefix}.filtered.haplotypes.vcf.gz.tbi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python3 --version 2>&1 | sed -e 's/Python //g')
bgzip: \$(bgzip --v | grep 'bgzip' | sed 's/.* //g')
tabix: \$(echo \$(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*\$//')
END_VERSIONS
"""

Expand Down
Loading