Skip to content

Commit

Permalink
Merge pull request #395 from genomic-medicine-sweden/fuseq_wes_update
Browse files Browse the repository at this point in the history
feat: Fuseq wes update
  • Loading branch information
jonca79 authored Feb 6, 2024
2 parents 37a461e + a6712f6 commit 7e626d0
Show file tree
Hide file tree
Showing 12 changed files with 7 additions and 266 deletions.
8 changes: 1 addition & 7 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ filter_vcf:
germline: "config/filters/config_hard_filter_germline_vep105.yaml"

filter_fuseq_wes:
min_support: 30
min_support: 50
filter_on_fusiondb: True

fuseq_wes:
Expand Down Expand Up @@ -170,9 +170,6 @@ gatk_mutect2_gvcf:
gatk_mutect2_merge_stats:
container: "docker://hydragenetics/gatk4:4.1.9.0"

gene_fuse:
container: "docker://hydragenetics/genefuse:0.6.1"

hotspot_report:
report_config: "config/reports/hotspot_report.yaml"
levels:
Expand Down Expand Up @@ -289,9 +286,6 @@ report_fusions:
star_fusion_low_support: 2
star_fusion_low_support_inframe: 6

report_gene_fuse:
min_unique_reads: 6

samtools_merge_bam:
extra: "-c -p"

Expand Down
12 changes: 0 additions & 12 deletions config/output_files.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -247,18 +247,6 @@ files:
types:
- T
- N
- name: GeneFuse report
input: fusions/report_gene_fuse/{sample}_{type}.gene_fuse_report.tsv
output: results/dna/{sample}_{type}/fusion/{sample}_{type}.gene_fuse_report.tsv
types:
- T
- N
- name: GeneFuse fusions TXT
input: fusions/gene_fuse/{sample}_{type}_gene_fuse_fusions.txt
output: results/dna/{sample}_{type}/additional_files/fusion/{sample}_{type}.gene_fuse_fusions.txt
types:
- T
- N
- name: ID-SNP VCF RNA
input: snv_indels/bcftools_id_snps/{sample}_{type}.id_snps.vcf
output: results/rna/{sample}_{type}/id_snps/{sample}_{type}.id_snps.vcf
Expand Down
46 changes: 1 addition & 45 deletions docs/dna_fusions.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,53 +6,8 @@ See the [fusions hydra-genetics module](https://hydra-genetics-fusions.readthedo

## Pipeline output files:

* `results/dna/fusion/{sample}_{type}.gene_fuse_report.tsv`
* `results/dna/fusion/{sample}_{type}.fuseq_wes.report.csv`

## Fusions calling using GeneFuse
DNA fusion calling is performed by **[GeneFuse](https://github.com/OpenGene/GeneFuse)** v0.6.1 on fastq-files. It uses a gene transcript target file to limit the number of targets to analyze.

### Configuration

**References**

* [Fasta reference](references.md#genefuse_fasta) genome
* [Gene transcript](references.md#genefuse_transcripts) file with genomic positions for all exons include in the analysis

<br />
**Cluster resources**

| **Options** | **Value** |
|-------------|-|
| mem_mb | 36864 |
| mem_per_cpu | 6144 |
| threads | 6 |
| time | "8:00:00" |

## GeneFuse Filtering and report
The output from GeneFuse is filtered and then reported into a fusion report using the in-house script [report_gene_fuse.py](https://github.com/genomic-medicine-sweden/Twist_Solid/blob/develop/workflow/scripts/report_gene_fuse.py) ([rule and config](softwares.md#report_gene_fuse)). The following filter criteria is used:

* Fusions must have at least 6 unique supporting reads.
* Very noisy fusion pairs found in almost all samples (defined in [`filter_fusions_20230214.csv`](references.md#genefuse_filter_fusions)) are removed:
- NPM1::ALK
- CLTC::NTRK3
- MSH2_ALK
- MSH2_HIP1
* Noisy fusion pairs found in some samples (defined in [`filter_fusions_20230214.csv`](references.md#genefuse_filter_fusions)) are filtered individually on the number of uniquely supporting reads:
- LMNA::EZR 9
- ABL1::STRN 7
- EZR::ALK 8
- RSPO2::BRAF 8
- LMNA::HIP1 12
- NPM1::BICC1 11
- RSPO2::ERG 13

### Result file

* `results/dna/fusion/{sample}_{type}.gene_fuse_report.tsv`

<br />

## Fusions calling using FuSeq_WES
DNA fusion calling is performed by **[FuSeq_WES](https://github.com/nghiavtr/FuSeq_WES)** v1.0.1 on bam-files. It uses a gene transcript target file to limit the number of targets to analyze.

Expand Down Expand Up @@ -91,6 +46,7 @@ The output from FuSeq_WES is filtered and then reported into a fusion report usi
|-------------|-|-|
| filter_on_fusiondb | True | Only keep fusions found in the fusion database |
| gene_white_list | [`fuseq_wes_gene_white_list.txt`](references.md#fuseq_wes_white_list) | Only keep fusions with at least one gene in the gene white list |
| gene_fusion_black_list | [`false_positive_fusion_pairs.txt`](references.md#gene_fusion_black_list) | Remove fusions pairs in fusion pair gene black list |
| gtf | [`hg19.refGene.gtf`](references.md#filter_report_fuseq_wes) | Transcript annotation |
| min_support | 30 | Minimal total number of supporting reads |
| transcript_black_list | [`fuseq_wes_transcript_black_list.txt`](references.md#fuseq_wes_transcript_black_list) | Transcripts that should not be used in annotation |
Expand Down
4 changes: 3 additions & 1 deletion docs/references.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ The following reference files, panel of normals and design files are needed to r
|_ _| <div id="fuseq_wes_paralog_db">paralog database</div> | `ensmbl_paralogs_grch37.RData` |
| <div id="filter_report_fuseq_wes">filter_report_fuseq_wes</div> | transcript annotation | `hg19.refGene.gtf` |
| | <div id="fuseq_wes_white_list">gene white list</div> | `fuseq_wes_gene_white_list.txt` |
| | <div id="gene_fusion_black_list">fusion gene pair black list</div> | `false_positive_fusion_pairs.txt` |
|_ _| <div id="fuseq_wes_transcript_black_list">transcript black list</div> | `fuseq_wes_transcript_black_list.txt` |
| <div id="hotspot_file">hotspot_annotation</div> | hotspots | `Hotspots_combined_regions_nodups.csv` |
| <div id="hotspot_report">hotspot_report</div> | hotspot_mutations | `Hotspots_combined_regions_nodups.csv` |
Expand Down Expand Up @@ -296,7 +297,7 @@ singularity docker://hydragenetics/purecn:2.2.0 Rscript $PURECN/IntervalFile.R -
| intervals | Target interval file | File created by the command described above |

## Pipeline specific files
These are design files and other pipeline specific only available to download from the Uppsala Owncloud solution.
These are design files and other pipeline specific files only available to download from out [git](https://github.com/genomic-medicine-sweden/Twist_Solid_pipeline_files) or the Uppsala Owncloud solution.

| File type | File | Description |
|-|-|-|
Expand All @@ -312,6 +313,7 @@ These are design files and other pipeline specific only available to download fr
| | `filter_fusions_20221114.csv` | Filtering criteria for false positive prone fusion partners |
| FuSeq_WES | `fuseq_params.txt` | Filtering parameters used by FuSeq_WES |
| FuSeq_WES_report | `fuseq_wes_gene_white_list.txt` | Gene list for filtering of fusion |
| | `false_positive_fusion_pairs.txt` | Gene list for filtering of fusion |
|_ _| `fuseq_wes_transcript_black_list.txt` | Transcripts that should not be used in annotation |
| CNVkit | `cnvkit_germline_blacklist_20221221.bed` | List of regions excluded from the germline vcf file |
| GATK CNV | `gnomad_SNP_0.001_target.annotated.interval_list` | Bed file with CNV backbone SNPs which are selected from <br />GnomAD with over 0.1% global population frequency |
Expand Down
23 changes: 0 additions & 23 deletions docs/softwares.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,29 +208,6 @@ Make a combined report of filtered fusion calls from all RNA fusion callers. App

---

## report_gene_fuse
The output from GeneFuse is filtered and then made into a fusion report. See further [DNA fusions report info](dna_fusions.md#filtering-and-report).

### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__report_gene_fuse__report_gene_fuse#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__report_gene_fuse__report_gene_fuse#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__report_gene_fuse#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__report_gene_fuse#

---

## purecn_modify_vcf
Increases the MQB (mean base quality) value by 5 as the qualities are so bad for our samples.

Expand Down
1 change: 0 additions & 1 deletion workflow/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ include: "rules/fix_vcf_ad_for_qci.smk"
include: "rules/hotspot_report.smk"
include: "rules/house_keeping_gene_coverage.smk"
include: "rules/purecn_modify_vcf.smk"
include: "rules/report_gene_fuse.smk"
include: "rules/report_fusions.smk"


Expand Down
36 changes: 0 additions & 36 deletions workflow/rules/report_gene_fuse.smk

This file was deleted.

24 changes: 0 additions & 24 deletions workflow/schemas/config.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -273,15 +273,6 @@ properties:
type: string
description: parameters that should be forwarded

gene_fuse:
type: object
properties:
genes:
type: string
description: path to gene files used by gene_fuse
required:
- genes

house_keeping_gene_coverage:
type: object
properties:
Expand Down Expand Up @@ -560,19 +551,6 @@ properties:
type: integer
description: lower limit of supporting reads to flag filter fusion that are inframe for StarFusion

report_gene_fuse:
description: report results from genefuse
type: object
properties:
filter_fusions:
type: string
description: file specifying fusions that should be filtered completely (value 0 in column 2) or have higher limit (value >0 in column 2)
min_unique_reads:
type: integer
description: lower limit of uniquely supporting reads to report fusion
required:
- min_unique_reads

trimmer_software:
description: trimmer software that should be used
pattern: "^(fastp_pe|None)$"
Expand Down Expand Up @@ -612,13 +590,11 @@ required:
- filter_vcf
- gatk_collect_allelic_counts
- gatk_denoise_read_counts
- gene_fuse
- hotspot_annotation
- hotspot_info
- hotspot_report
- msisensor_pro
- multiqc
- reference
- report_gene_fuse
- trimmer_software
- vep
24 changes: 2 additions & 22 deletions workflow/schemas/resources.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -228,9 +228,9 @@ properties:
type: integer
description: number of threads to be available

report_gene_fuse:
sample_mixup_check:
type: object
description: resource definitions for generating a gene fuse report
description: resource definitions for sample_mixup_check
properties:
mem_mb:
type: integer
Expand All @@ -248,25 +248,5 @@ properties:
type: integer
description: number of threads to be available

sample_mixup_check:
type: object
description: resource definitions for sample_mixup_check
properties:
mem_mb:
type: integer
description: max memory in MB to be available
mem_per_cpu:
type: integer
description: memory in MB used per cpu
partition:
type: string
description: partition to use on cluster
threads:
type: integer
description: number of threads to be available
time:
type: string
description: max execution time

required:
- default_resources
19 changes: 0 additions & 19 deletions workflow/schemas/rules.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -230,25 +230,6 @@ properties:
type: string
description: Excel frienly text report with filtered fusion calls from respective caller

report_gene_fuse:
type: object
description: input and output parameters for report_gene_fuse
properties:
input:
type: object
description: list of inputs
properties:
fusions:
type: string
description: Called fusions by GeneFuse
output:
type: object
description: list of outputs
properties:
report:
type: string
description: Excel friendly text report with filtered fusion calls from GeneFuse

purecn_modify_vcf:
type: object
description: input and output parameters for purecn_modify_vcf
Expand Down
12 changes: 0 additions & 12 deletions workflow/schemas/singularity.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -206,17 +206,6 @@ properties:
required:
- container

gene_fuse:
type: object
properties:
container:
type: string
description: name or path to a default docker/singularity container
pattern: >-
hydragenetics/genefuse:0\.6\.1$|hydragenetics_genefuse_0\.6\.1\.sif$
required:
- container

manta_config_t:
type: object
properties:
Expand Down Expand Up @@ -455,7 +444,6 @@ required:
- gatk_mutect2
- gatk_mutect2_gvcf
- gatk_model_segments
- gene_fuse
- manta_config_t
- manta_run_workflow_t
- mosdepth_bed
Expand Down
Loading

0 comments on commit 7e626d0

Please sign in to comment.