Skip to content

Commit

Permalink
docs: remove gpu from doc add cpu
Browse files Browse the repository at this point in the history
  • Loading branch information
elleira committed Apr 3, 2024
1 parent d16996f commit 3e8e72a
Show file tree
Hide file tree
Showing 13 changed files with 1,026 additions and 889 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@
This ReadMe is only a brief introduction, please refer to ReadTheDocs for the latest documentation.

---
![Lint](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/lint.yaml/badge.svg?branch=main)
![Snakefmt](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/snakefmt.yaml/badge.svg?branch=main)
![snakemake dry run](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/snakemake-dry-run.yaml/badge.svg?branch=main)
![integration test](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/integration1.yaml/badge.svg?branch=main)
![Lint](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/lint.yaml/badge.svg?branch=main)
![Snakefmt](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/snakefmt.yaml/badge.svg?branch=main)
![snakemake dry run](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/snakemake-dry-run.yaml/badge.svg?branch=main)
![integration test](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/integration1.yaml/badge.svg?branch=main)

![pycodestyle](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/pycodestyle.yaml/badge.svg?branch=main)
![pytest](https://github.com/clinical-genomics-uppsala/marple/actions/workflows/pytest.yaml/badge.svg?branch=main)
![pycodestyle](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/pycodestyle.yaml/badge.svg?branch=main)
![pytest](https://github.com/clinical-genomics-uppsala/marple_rd_tc/actions/workflows/pytest.yaml/badge.svg?branch=main)

[![License: GPL-3](https://img.shields.io/badge/License-GPL3-yellow.svg)](https://opensource.org/licenses/gpl-3.0.html)

Expand All @@ -40,7 +40,7 @@ $ snakemake -n -s ../../workflow/Snakefile --configfiles ../../config/config.yam
To use this run this pipeline `sample.tsv`, `units.tsv`, `resources.yaml`, and `config.yaml` files need to be available in the current directory (or otherwise specified in `config.yaml`). You always need to specify the `config`-file and `sequenceid` variable in the command. To run the pipeline:

```bash
$ snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="990909_test" -s /path/to/marple/workflow/Snakefile
$ snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="990909_test" -s /path/to/marple_rd_tc/workflow/Snakefile
```

## :books: [Output files](https://marple-rd-tc.readthedocs.io/en/latest/result_files/)
Expand Down
2 changes: 2 additions & 0 deletions docs/includes/abbreviations.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@
*[SNVs]: Single Nucleotide Variants
*[SV]: Structural Variant
*[VCF]: Variant Call Format
*[WES]: Whole Exome Sequencing
*[WGS]: Whole Genome Sequencing
*[QC]: Quality Control
900 changes: 477 additions & 423 deletions docs/includes/images/rulegraph.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 13 additions & 14 deletions docs/includes/images/snv_indels.dot
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,24 @@ digraph snakemake_dag {
graph[bgcolor=white, margin=0];
node[shape=box, style=rounded, fontname=sans, fontsize=10, penwidth=2];
edge[penwidth=2, color=grey];
0[label = "bam", color = "0.0 0.0 0.0", style="dotted"];
1[label = "vcf: annotated normalized", color = "0.0 0.0 0.0", style="dotted"];
2[label = "vcf: genome", color = "0.0 0.0 0.0", style="dotted"];
3[label = "annotation_vep", color = "0.06 0.6 0.85", style="rounded"];
bam[label = "bam", color = "0.0 0.0 0.0", style="dotted"];
vcf[label = "vcf: annotated normalized", color = "0.0 0.0 0.0", style="dotted"];
gvcf[label = "vcf: genome", color = "0.0 0.0 0.0", style="dotted"];
vep[label = "annotation_vep", color = "0.06 0.6 0.85", style="rounded"];
4[label = "add_ref_to_vcf", color = "0.52 0.6 0.85", style="rounded"];
5[label = "snv_indels_bcftools_sort", color = "0.36 0.6 0.85", style="rounded"];
6[label = "snv_indels_vt_normalize", color = "0.52 0.6 0.85", style="rounded"];
7[label = "snv_indels_vt_decompose", color = "0.10 0.6 0.85", style="rounded"];
8[label = "snv_indels_fix_af", color = "0.36 0.6 0.85", style="rounded"];
9[label = "parabricks_pbrun_deepvariant", color = "0.53 0.6 0.85", style="rounded"];
fix_af[label = "snv_indels_fix_af", color = "0.36 0.6 0.85", style="rounded"];
deepvariant[label = "snv_indels_deepvariant", color = "0.53 0.6 0.85", style="rounded"];

0 -> 9
8 -> 7
bam -> deepvariant
fix_af -> 7
7 -> 6
6 -> 5
5 -> 3
3 -> 4
4 -> 1
9 -> 8
8 -> 3
3 -> 2
5 -> vep
vep -> 4
4 -> vcf
deepvariant -> fix_af
fix_af -> gvcf
}
Binary file modified docs/includes/images/snv_indels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 1 addition & 5 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,10 @@ Marple :woman_detective: uses the following hydra genetics modules:
- [Alignment](https://github.com/hydra-genetics/alignment/tree/v0.4.0)
- [Annotation](https://github.com/hydra-genetics/annotation/tree/v0.3.0)
- [CNV](https://github.com/hydra-genetics/cnv_sv/tree/78f270c)
- [Parabricks](https://github.com/hydra-genetics/parabricks/tree/v1.1.0)
- [Prealignment](https://github.com/hydra-genetics/prealignment/tree/v1.0.0)
- [SNV indels](https://github.com/hydra-genetics/snv_indels/tree/v0.5.0)
- [SNV indels](https://github.com/hydra-genetics/snv_indels/tree/3935ecf)
- [QC](https://github.com/hydra-genetics/qc/tree/ca947b1)

!!! warning
As of now a GPU with licensed Parabricks is needed ro run SNV calling in Marple. A non-licensed CPU alternative will be added at a later stage.


### :judge: Rulegraph
![dag plot](includes/images/rulegraph.svg){: style="height:100%;width:100%"}
Expand Down
3 changes: 1 addition & 2 deletions docs/running.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
## :exclamation: Requirements

### Recommended hardware
- [GPU Nividia A30 with Parabricks v4.1.1-1 installed](https://docs.nvidia.com/clara/parabricks/latest/whatsnew/whatsnew4.1.1-1.html)
- CPU: >10 cores per sample
- Memory: 6GB per core

Expand Down Expand Up @@ -49,7 +48,7 @@ source virtual/environment/bin/activate
# Install requirements
pip install -r requirements.txt
```
This will install all required softwares needed to run the pipeline in an virtual environment which you will have to remember to activate before running the pipeline each time.
This will install all required softwares needed to run the pipeline in an virtual environment which you will have to activate before running the pipeline each time.

## :books: Input files
Four different files need to be adapted to your compute environment and sequence data, `samples.tsv`, `units.tsv`, `config.yaml` and `resources.yaml`.
Expand Down
4 changes: 2 additions & 2 deletions docs/running_ref.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ To generate `.bam` **and** `.bai`-files for all samples you need to run Marple u

```bash
# Run snakemake command with the extra config parameter called sequenceid
snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="230202-test" -s /path/to/marple/workflow/Snakefile --no-temp --until qc_mosdepth_bed
snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="normal_samples" -s /path/to/marple/workflow/Snakefile --no-temp --until qc_mosdepth_bed

```
### :books: Input files
Expand Down Expand Up @@ -35,7 +35,7 @@ An `resources.yaml` file can also be found in the `config/`-folder. This is adap

### :rocket: Run command
```bash
#Activate the virtual enviorment
#Activate the virtual environment
source virtual/environment/bin/activate

# Run snakemake command
Expand Down
17 changes: 17 additions & 0 deletions docs/softwares.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,23 @@ Rules that creates a `.xlsx` file per sample with aggregated coverage informatio

#RESOURCESSCHEMA__export_qc_bedtools_intersect#

### :snake: Rule

#SNAKEMAKE_RULE_SOURCE__export_qc__export_qc_bedtools_intersect_pgrs#

#### :left_right_arrow: input / output files

#SNAKEMAKE_RULE_TABLE__export_qc__export_qc_bedtools_intersect_pgrs#

### :wrench: Configuration

#### Software settings (`config.yaml`)

#CONFIGSCHEMA__export_qc_bedtools_intersect_pgrs#

#### Resources settings (`resources.yaml`)

#RESOURCESSCHEMA__export_qc_bedtools_intersect_pgrs#

### :snake: Rule

Expand Down
19 changes: 6 additions & 13 deletions docs/steps.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Steps in Marple :woman_detective:
To go into details of the pipeline we dived the pipeline into modules similar to Hydra-Genetics module system.
To go into details of the pipeline we dived the pipeline into modules similar to Hydra-Genetics module system. Default hydra-genetics settings/resources are used if no configuration is specified.

---
## Prealignment
See the **Prealignment** hydra-genetics module documentation on [ReadTheDoc](https://hydra-genetics-prealignment.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/prealignment) documentation for more details on the softwares. Default hydra-genetics settings/resources are used if no configuration is specified.
See the **Prealignment** hydra-genetics module documentation on [ReadTheDoc](https://hydra-genetics-prealignment.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/prealignment/tree/v1.0.0) documentation for more details on the softwares.

![dag plot](includes/images/prealignment.png){: style="height:30%;width:30%"}

Expand Down Expand Up @@ -63,31 +63,24 @@ Bamfile indexing is performed by **[samtools index](http://www.htslib.org/doc/sa

---
## SNV indels
SNV and indels are called using the **Parabricks** ([github](https://github.com/hydra-genetics/parabricks/tree/v1.1.0)) and **SNV_indels** ([ReadTheDoc](https://hydra-genetics-snv-indels.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/snv_indels/tree/v0.5.0)) modules. Annotation is then done with **Annotation** module ([ReadTheDocs](https://hydra-genetics-annotation.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/annotation/tree/v0.3.0)).
SNV and indels are called using the **SNV_indels** ([ReadTheDoc](https://hydra-genetics-snv-indels.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/snv_indels/tree/3935ecf)) module. Annotation is then done with **Annotation** module ([ReadTheDocs](https://hydra-genetics-annotation.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/annotation/tree/v0.3.0)).

![dag plot](includes/images/snv_indels.png){: style="height:100%;width:100%"}

!!! warning
As of now a GPU with licensed Parabricks is needed ro run SNV calling. A non-licensed CPU alternative will be added at a later stage.

### Pipeline output files

* `Results/{sample}_{sequenceid}/{sample}_{sequenceid}.vcf.gz`
* `Results/{sample}_{sequenceid}/{sample}_{sequenceid}.genome.vcf.gz`

### SNV calling
#### GPU track
Variants are called using [**Parabricks deepvariant** v4.1.1-1](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_deepvariant.html#man-deepvariant) on a GPU licensed for Parabricks. `pbrun_deepvariant` is run with the interval file `config["refernce"]["design_bed"]` and the extra parameters defined in `config.yaml` (`--use-wes-model --disable-use-window-selector-model --gvcf `). This ensures that a genome vcf is produced as well as a standard vcf, by using `disable-use-window-selector-model` we increases reproducibility for later implementation of a parallel CPU-track. The AF field is added to the `INFO` column in the vcf:s using the `fix_af.py` from the snv_indel module. The vcf header in the standard vcf is also updated to include a reference line using the `add_ref_to_vcf.py` to ensure that programs such as Alissa acknowledge the use of Hg38.

#### CPU track
!!! note
To be added.
Variants are called using [Deepvariant](https://github.com/google/deepvariant). Deepvariant is run per chromosome over the regions defined in `config["references"]["design_bed"]`. The model type is set to "WES" and `--output_gvcf` is used to ensure that both genome vcf as well as a standard vcf is produced. The AF field is added to the `INFO` column in the vcf:s using the `fix_af.py`. The vcf header in the standard vcf is also updated to include a reference line using the `add_ref_to_vcf.py` to ensure that programs such as Alissa acknowledge the use of Hg38.

### Normalizing
The standard vcf files is decomposed with [**vt decompose**](https://genome.sph.umich.edu/wiki/Vt#Decompose) followed by [**vt decompose_blocksub**](https://genome.sph.umich.edu/wiki/Vt#Decompose_biallelic_block_substitutions) v2015.11.10. The decomposed vcf files are then normalized by [**vt normalize**](https://genome.sph.umich.edu/wiki/Vt#Normalization) v2015.11.10.

### Annotation
Both the normalized standard VCF files and the genome vcf files are then annotated using **[VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)** v109.3. Vep is run with the extra parameters `--assembly GRCh38 --check_existing --pick --variant_class --everything`.
The normalized standard VCF files are then annotated using **[VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)** v109.3. Vep is run with the extra parameters `--assembly GRCh38 --check_existing --pick --variant_class --everything`.

See the [annotation hydra-genetics module](https://hydra-genetics-annotation.readthedocs.io/en/latest/) for additional information.

Expand All @@ -104,7 +97,7 @@ CNVs are called using the Hydra-Genetics **CNV_SV** module ([ReadTheDocs](https:
* `Results/{sample}_{sequenceid}/{sample}_{sequenceid}_exomedepth.aed`

### Exomedepth
To call larger structural variants **[Exomedepth](https://cran.r-project.org/web/packages/ExomeDepth/index.html)** v1.1.15 is used. Exomedepth does **not** use a window approach but evaluates each row in the bedfile as a segment, therefor the bedfile need to be split into appropriate large windows (e.g. using `bedtools makewindows`). Exomedepth also need a `RData` file containing the normal pool, this can be created using the [Marple - references workflow](/running_ref). Lines with no-change calls (`reads.ratio == 1`) are removed from the output for Alissa compatibility.
To call larger structural variants **[Exomedepth](https://cran.r-project.org/web/packages/ExomeDepth/index.html)** v1.1.15 is used. Exomedepth does **not** use a window approach but evaluates each row in the bedfile as a segment, therefor the bedfile need to be split into appropriate large windows (e.g. using `bedtools makewindows`). Exomedepth also need a `RData` file containing the normal pool, this can be created using the [Marple - references workflow](/running_ref). Lines with no-change calls (`reads.ratio == 1`) are removed from the output for Alissa compatibility. Since no sex-chromosome are included in the design Exomedepth is run with the same normalpool irregardless of the sample's biological sex. Marple is designed on HG38 therefor a genes and exonfile are also needed for annotation.

---
## QC
Expand Down
Loading

0 comments on commit 3e8e72a

Please sign in to comment.