docs: remove gpu from doc add cpu

clinical-genomics-uppsala · Apr 3, 2024 · 3e8e72a · 3e8e72a
1 parent d16996f
commit 3e8e72a
Show file tree

Hide file tree

Showing 13 changed files with 1,026 additions and 889 deletions.
diff --git a/README.md b/README.md
@@ -10,13 +10,13 @@
 This ReadMe is only a brief introduction, please refer to ReadTheDocs for the latest documentation. 
 
 ---
-![Lint](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/lint.yaml/badge.svg?branch=main)
-![Snakefmt](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/snakefmt.yaml/badge.svg?branch=main)
-![snakemake dry run](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/snakemake-dry-run.yaml/badge.svg?branch=main)
-![integration test](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/integration1.yaml/badge.svg?branch=main)
+![Lint](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/lint.yaml/badge.svg?branch=main)
+![Snakefmt](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/snakefmt.yaml/badge.svg?branch=main)
+![snakemake dry run](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/snakemake-dry-run.yaml/badge.svg?branch=main)
+![integration test](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/integration1.yaml/badge.svg?branch=main)
 
-![pycodestyle](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/pycodestyle.yaml/badge.svg?branch=main)
-![pytest](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/pytest.yaml/badge.svg?branch=main)
+![pycodestyle](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/pycodestyle.yaml/badge.svg?branch=main)
+![pytest](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/pytest.yaml/badge.svg?branch=main)
 
 [![License: GPL-3](https://img.shields.io/badge/License-GPL3-yellow.svg)](https://opensource.org/licenses/gpl-3.0.html)
 
@@ -40,7 +40,7 @@ $ snakemake -n -s ../../workflow/Snakefile --configfiles ../../config/config.yam
 To use this run this pipeline `sample.tsv`, `units.tsv`, `resources.yaml`, and `config.yaml` files need to be available in the current directory (or otherwise specified in `config.yaml`). You always need to specify the `config`-file and `sequenceid` variable in the command. To run the pipeline:
 
 ```bash
-$ snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="990909_test" -s /path/to/marple/workflow/Snakefile
+$ snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="990909_test" -s /path/to/marple_rd_tc/workflow/Snakefile
 ```
 
 ## :books: [Output files](https://marple-rd-tc.readthedocs.io/en/latest/result_files/)

diff --git a/docs/includes/abbreviations.md b/docs/includes/abbreviations.md
@@ -7,4 +7,6 @@
 *[SNVs]: Single Nucleotide Variants
 *[SV]: Structural Variant
 *[VCF]: Variant Call Format
+*[WES]: Whole Exome Sequencing
+*[WGS]: Whole Genome Sequencing
 *[QC]: Quality Control
diff --git a/docs/includes/images/rulegraph.svg b/docs/includes/images/rulegraph.svg
diff --git a/docs/includes/images/snv_indels.dot b/docs/includes/images/snv_indels.dot
@@ -3,25 +3,24 @@ digraph snakemake_dag {
     graph[bgcolor=white, margin=0];
     node[shape=box, style=rounded, fontname=sans,                 fontsize=10, penwidth=2];
     edge[penwidth=2, color=grey];
-	0[label = "bam", color = "0.0 0.0 0.0", style="dotted"];
-	1[label = "vcf: annotated normalized", color = "0.0 0.0 0.0", style="dotted"];
-	2[label = "vcf: genome", color = "0.0 0.0 0.0", style="dotted"];
-	3[label = "annotation_vep", color = "0.06 0.6 0.85", style="rounded"];
+	bam[label = "bam", color = "0.0 0.0 0.0", style="dotted"];
+	vcf[label = "vcf: annotated normalized", color = "0.0 0.0 0.0", style="dotted"];
+	gvcf[label = "vcf: genome", color = "0.0 0.0 0.0", style="dotted"];
+	vep[label = "annotation_vep", color = "0.06 0.6 0.85", style="rounded"];
 	4[label = "add_ref_to_vcf", color = "0.52 0.6 0.85", style="rounded"];
 	5[label = "snv_indels_bcftools_sort", color = "0.36 0.6 0.85", style="rounded"];
 	6[label = "snv_indels_vt_normalize", color = "0.52 0.6 0.85", style="rounded"];
 	7[label = "snv_indels_vt_decompose", color = "0.10 0.6 0.85", style="rounded"];
-	8[label = "snv_indels_fix_af", color = "0.36 0.6 0.85", style="rounded"];
-	9[label = "parabricks_pbrun_deepvariant", color = "0.53 0.6 0.85", style="rounded"];
+	fix_af[label = "snv_indels_fix_af", color = "0.36 0.6 0.85", style="rounded"];
+	deepvariant[label = "snv_indels_deepvariant", color = "0.53 0.6 0.85", style="rounded"];
 
-	0 -> 9
-	8 -> 7
+	bam -> deepvariant
+	fix_af -> 7
 	7 -> 6
 	6 -> 5
-	5 -> 3
-	3 -> 4
-	4 -> 1
-	9 -> 8
-	8 -> 3
-	3 -> 2
+	5 -> vep
+	vep -> 4
+	4 -> vcf
+	deepvariant -> fix_af
+	fix_af -> gvcf
 }            
diff --git a/docs/includes/images/snv_indels.png b/docs/includes/images/snv_indels.png
diff --git a/docs/index.md b/docs/index.md
@@ -14,14 +14,10 @@ Marple :woman_detective: uses the following hydra genetics modules:
 - [Alignment](https://github.com/hydra-genetics/alignment/tree/v0.4.0)
 - [Annotation](https://github.com/hydra-genetics/annotation/tree/v0.3.0)
 - [CNV](https://github.com/hydra-genetics/cnv_sv/tree/78f270c)
-- [Parabricks](https://github.com/hydra-genetics/parabricks/tree/v1.1.0)
 - [Prealignment](https://github.com/hydra-genetics/prealignment/tree/v1.0.0)
-- [SNV indels](https://github.com/hydra-genetics/snv_indels/tree/v0.5.0)
+- [SNV indels](https://github.com/hydra-genetics/snv_indels/tree/3935ecf)
 - [QC](https://github.com/hydra-genetics/qc/tree/ca947b1)
 
-!!! warning
-    As of now a GPU with licensed Parabricks is needed ro run SNV calling in Marple. A non-licensed CPU alternative will be added at a later stage.
-
 
 ### :judge: Rulegraph 
 ![dag plot](includes/images/rulegraph.svg){: style="height:100%;width:100%"}

diff --git a/docs/running.md b/docs/running.md
@@ -2,7 +2,6 @@
 ## :exclamation: Requirements
 
 ### Recommended hardware 
- - [GPU Nividia A30 with Parabricks v4.1.1-1 installed](https://docs.nvidia.com/clara/parabricks/latest/whatsnew/whatsnew4.1.1-1.html)
  - CPU: >10 cores per sample
  - Memory: 6GB per core
 
@@ -49,7 +48,7 @@ source virtual/environment/bin/activate
 # Install requirements
 pip install -r requirements.txt
 ```
-This will install all required softwares needed to run the pipeline in an virtual environment which you will have to remember to activate before running the pipeline each time. 
+This will install all required softwares needed to run the pipeline in an virtual environment which you will have to activate before running the pipeline each time. 
 
 ## :books: Input files 
 Four different files need to be adapted to your compute environment and sequence data, `samples.tsv`, `units.tsv`, `config.yaml` and `resources.yaml`.

diff --git a/docs/running_ref.md b/docs/running_ref.md
@@ -6,7 +6,7 @@ To generate `.bam` **and** `.bai`-files for all samples you need to run Marple u
 
 ```bash
 # Run snakemake command with the extra config parameter called sequenceid
-snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="230202-test" -s /path/to/marple/workflow/Snakefile --no-temp --until qc_mosdepth_bed
+snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="normal_samples" -s /path/to/marple/workflow/Snakefile --no-temp --until qc_mosdepth_bed
 
 ```
 ### :books: Input files 
@@ -35,7 +35,7 @@ An `resources.yaml` file can also be found in the `config/`-folder. This is adap
 
 ### :rocket: Run command 
 ```bash
-#Activate the virtual enviorment
+#Activate the virtual environment
 source virtual/environment/bin/activate
 
 # Run snakemake command

diff --git a/docs/softwares.md b/docs/softwares.md
@@ -68,6 +68,23 @@ Rules that creates a `.xlsx` file per sample with aggregated coverage informatio
 
 #RESOURCESSCHEMA__export_qc_bedtools_intersect#
 
+### :snake: Rule
+
+#SNAKEMAKE_RULE_SOURCE__export_qc__export_qc_bedtools_intersect_pgrs#
+
+#### :left_right_arrow: input / output files
+
+#SNAKEMAKE_RULE_TABLE__export_qc__export_qc_bedtools_intersect_pgrs#
+
+### :wrench: Configuration
+
+#### Software settings (`config.yaml`)
+
+#CONFIGSCHEMA__export_qc_bedtools_intersect_pgrs#
+
+#### Resources settings (`resources.yaml`)
+
+#RESOURCESSCHEMA__export_qc_bedtools_intersect_pgrs#
 
 ### :snake: Rule
 

diff --git a/docs/steps.md b/docs/steps.md
@@ -1,9 +1,9 @@
 # Steps in Marple :woman_detective:
-To go into details of the pipeline we dived the pipeline into modules similar to Hydra-Genetics module system.
+To go into details of the pipeline we dived the pipeline into modules similar to Hydra-Genetics module system. Default hydra-genetics settings/resources are used if no configuration is specified.
 
 ---
 ## Prealignment
-See the **Prealignment** hydra-genetics module documentation on [ReadTheDoc](https://hydra-genetics-prealignment.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/prealignment) documentation for more details on the softwares. Default hydra-genetics settings/resources are used if no configuration is specified.
+See the **Prealignment** hydra-genetics module documentation on [ReadTheDoc](https://hydra-genetics-prealignment.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/prealignment/tree/v1.0.0) documentation for more details on the softwares. 
 
 ![dag plot](includes/images/prealignment.png){: style="height:30%;width:30%"}
 
@@ -63,31 +63,24 @@ Bamfile indexing is performed by **[samtools index](http://www.htslib.org/doc/sa
 
 ---
 ## SNV indels
-SNV and indels are called using the **Parabricks** ([github](https://github.com/hydra-genetics/parabricks/tree/v1.1.0)) and **SNV_indels** ([ReadTheDoc](https://hydra-genetics-snv-indels.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/snv_indels/tree/v0.5.0)) modules. Annotation is then done with **Annotation** module ([ReadTheDocs](https://hydra-genetics-annotation.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/annotation/tree/v0.3.0)).
+SNV and indels are called using the **SNV_indels** ([ReadTheDoc](https://hydra-genetics-snv-indels.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/snv_indels/tree/3935ecf)) module. Annotation is then done with **Annotation** module ([ReadTheDocs](https://hydra-genetics-annotation.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/annotation/tree/v0.3.0)).
 
 ![dag plot](includes/images/snv_indels.png){: style="height:100%;width:100%"}
 
-!!! warning
-    As of now a GPU with licensed Parabricks is needed ro run SNV calling. A non-licensed CPU alternative will be added at a later stage.
 
 ### Pipeline output files
 
 * `Results/{sample}_{sequenceid}/{sample}_{sequenceid}.vcf.gz`
 * `Results/{sample}_{sequenceid}/{sample}_{sequenceid}.genome.vcf.gz`
 
 ### SNV calling
-#### GPU track
-Variants are called using [**Parabricks deepvariant** v4.1.1-1](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_deepvariant.html#man-deepvariant) on a GPU licensed for Parabricks. `pbrun_deepvariant` is run with the interval file `config["refernce"]["design_bed"]` and the extra parameters defined in `config.yaml` (`--use-wes-model --disable-use-window-selector-model --gvcf `). This ensures that a genome vcf is produced as well as a standard vcf, by using `disable-use-window-selector-model` we increases reproducibility for later implementation of a parallel CPU-track. The AF field is added to the `INFO` column in the vcf:s using the `fix_af.py` from the snv_indel module. The vcf header in the standard vcf is also updated to include a reference line using the `add_ref_to_vcf.py` to ensure that programs such as Alissa acknowledge the use of Hg38.
-
-#### CPU track
-!!! note
-    To be added.
+Variants are called using [Deepvariant](https://github.com/google/deepvariant). Deepvariant is run per chromosome over the regions defined in `config["references"]["design_bed"]`. The model type is set to "WES" and `--output_gvcf` is used to ensure that both genome vcf as well as a standard vcf is produced. The AF field is added to the `INFO` column in the vcf:s using the `fix_af.py`. The vcf header in the standard vcf is also updated to include a reference line using the `add_ref_to_vcf.py` to ensure that programs such as Alissa acknowledge the use of Hg38.
 
 ### Normalizing
 The standard vcf files is decomposed with [**vt decompose**](https://genome.sph.umich.edu/wiki/Vt#Decompose) followed by [**vt decompose_blocksub**](https://genome.sph.umich.edu/wiki/Vt#Decompose_biallelic_block_substitutions) v2015.11.10. The decomposed vcf files are then normalized by [**vt normalize**](https://genome.sph.umich.edu/wiki/Vt#Normalization) v2015.11.10.
 
 ### Annotation
-Both the normalized standard VCF files and the genome vcf files are then annotated using **[VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)** v109.3. Vep is run with the extra parameters `--assembly GRCh38 --check_existing --pick --variant_class --everything`.
+The normalized standard VCF files are then annotated using **[VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)** v109.3. Vep is run with the extra parameters `--assembly GRCh38 --check_existing --pick --variant_class --everything`.
 
 See the [annotation hydra-genetics module](https://hydra-genetics-annotation.readthedocs.io/en/latest/) for additional information.
 
@@ -104,7 +97,7 @@ CNVs are called using the Hydra-Genetics **CNV_SV** module ([ReadTheDocs](https:
 * `Results/{sample}_{sequenceid}/{sample}_{sequenceid}_exomedepth.aed`
 
 ### Exomedepth
-To call larger structural variants **[Exomedepth](https://cran.r-project.org/web/packages/ExomeDepth/index.html)** v1.1.15 is used. Exomedepth does **not** use a window approach but evaluates each row in the bedfile as a segment, therefor the bedfile need to be split into appropriate large windows (e.g. using `bedtools makewindows`). Exomedepth also need a `RData` file containing the normal pool, this can be created using the [Marple - references workflow](/running_ref). Lines with no-change calls (`reads.ratio == 1`) are removed from the output for Alissa compatibility. 
+To call larger structural variants **[Exomedepth](https://cran.r-project.org/web/packages/ExomeDepth/index.html)** v1.1.15 is used. Exomedepth does **not** use a window approach but evaluates each row in the bedfile as a segment, therefor the bedfile need to be split into appropriate large windows (e.g. using `bedtools makewindows`). Exomedepth also need a `RData` file containing the normal pool, this can be created using the [Marple - references workflow](/running_ref). Lines with no-change calls (`reads.ratio == 1`) are removed from the output for Alissa compatibility. Since no sex-chromosome are included in the design Exomedepth is run with the same normalpool irregardless of the sample's biological sex. Marple is designed on HG38 therefor a genes and exonfile are also needed for annotation. 
 
 ---
 ## QC