diff --git a/config/config.yaml b/config/config.yaml index ff5225e..564214e 100644 --- a/config/config.yaml +++ b/config/config.yaml @@ -13,7 +13,7 @@ modules: snv_indels: "v0.5.0" qc: "ca947b1" -default_container: "docker://hydragenetics/common:0.4.0" +default_container: "docker://hydragenetics/common:1.8.1" reference: fasta: "/data/ref_genomes/GRCh38/reference_grasnatter/homo_sapiens.fasta" diff --git a/config/multiqc_config.yaml b/config/multiqc_config.yaml index 01eebf3..c9d4e79 100644 --- a/config/multiqc_config.yaml +++ b/config/multiqc_config.yaml @@ -91,7 +91,7 @@ table_columns_visible: PCT_30X: False PCT_TARGET_BASES_30X: False FOLD_ENRICHMENT: False - TOTAL_READS: True + TOTAL_READS: False Samtools: error_rate: False non-primary_alignments: False @@ -99,7 +99,7 @@ table_columns_visible: reads_mapped_percent: True reads_properly_paired_percent: True reads_MQ0_percent: False - raw_total_sequences: False #only on bedfile not total of fastq, bases on target only + raw_total_sequences: True #only on bedfile not total of fastq, bases on target only # Patriks plug in, addera egna columner till general stats multiqc_cgs: diff --git a/docs/result_files.md b/docs/result_files.md index 7c7c763..61b8218 100644 --- a/docs/result_files.md +++ b/docs/result_files.md @@ -39,8 +39,8 @@ The general statistics table are ordered based on the sample order in `SampleShe | Column Name | Origin | Comment | | --- | --- | --- | -| K Reads | [Picard](https://broadinstitute.github.io/picard/) HSMetrics | Total number of reads in inputfile (`alignment/samtools_merge_bam/{sample}_{type}.bam`) | -| % Mapped| [Samtools stats](http://www.htslib.org/doc/samtools-stats.html) | Only reads on target (`config[reference][design_bed]`) | +| K Reads | [Samtools stats](http://www.htslib.org/doc/samtools-stats.html) | Total number of reads in inputfile (`alignment/samtools_merge_bam/{sample}_{type}.bam`) | +| % Mapped| [Samtools stats](http://www.htslib.org/doc/samtools-stats.html) | Percent reads mapped, anywhere in the reference (no design file used) | | % Proper pairs| [Samtools stats](http://www.htslib.org/doc/samtools-stats.html) | Only reads on target (`config[reference][design_bed]`) | | Average Quality | [Samtools stats](http://www.htslib.org/doc/samtools-stats.html) | Ratio between sum of base quality over total length. Only reads on target (`config[reference][design_bed]`) | | Median | [Mosdepth](https://github.com/brentp/mosdepth) | Median Coverage over coding exon in design (`config[reference][exon_bed]`) | diff --git a/docs/steps.md b/docs/steps.md index e36dada..b97d4c0 100644 --- a/docs/steps.md +++ b/docs/steps.md @@ -3,7 +3,7 @@ To go into details of the pipeline we dived the pipeline into modules similar to --- ## Prealignment -See the **Prealignment** hydra-genetics module documentation on [ReadTheDoc](https://hydra-genetics-prealignment.readthedocs.io/en/latest/) or [github]() documentation for more details on the softwares. Default hydra-genetics settings/resources are used if no configuration is specified. +See the **Prealignment** hydra-genetics module documentation on [ReadTheDoc](https://hydra-genetics-prealignment.readthedocs.io/en/latest/) or [github](https://github.com/hydra-genetics/prealignment) documentation for more details on the softwares. Default hydra-genetics settings/resources are used if no configuration is specified. ![dag plot](includes/images/prealignment.png){: style="height:30%;width:30%"} @@ -87,7 +87,7 @@ Variants are called using [**Parabricks deepvariant** v4.1.1-1](https://docs.nvi The standard vcf files is decomposed with [**vt decompose**](https://genome.sph.umich.edu/wiki/Vt#Decompose) followed by [**vt decompose_blocksub**](https://genome.sph.umich.edu/wiki/Vt#Decompose_biallelic_block_substitutions) v2015.11.10. The decomposed vcf files are then normalized by [**vt normalize**](https://genome.sph.umich.edu/wiki/Vt#Normalization) v2015.11.10. ### Annotation -Both the normalized standard VCF files and the genome vcf files are then annotated using **[VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)** v109. Vep is run with the extra parameters `--assembly GRCh38 --check_existing --pick --variant_class --everything`. +Both the normalized standard VCF files and the genome vcf files are then annotated using **[VEP](https://www.ensembl.org/info/docs/tools/vep/index.html)** v109.3. Vep is run with the extra parameters `--assembly GRCh38 --check_existing --pick --variant_class --everything`. See the [annotation hydra-genetics module](https://hydra-genetics-annotation.readthedocs.io/en/latest/) for additional information. @@ -139,7 +139,7 @@ The report is configured based on a MultiQC config file. **[Mosdepth](https://github.com/brentp/mosdepth)** v0.3.2 is used together with a bedfile covering all coding exons (`config[reference][exon_bed]`) and thresholds (`10,20,50`) to calculate coverage. ### Samtools -**[Samtools stats](http://www.htslib.org/doc/samtools-stats.html)** v1.15 is run on BWA-mem aligned and merged bam files over the full bedfile (`config[reference][design_bed]`). +**[Samtools stats](http://www.htslib.org/doc/samtools-stats.html)** v1.15 is run on BWA-mem aligned and merged bam files without any designfile. ### Picard **[Picard](https://broadinstitute.github.io/picard/)** v2.25.4 is run on BWA-mem aligned and merged bam files collecting a number of metrics. The metrics calculated are listed below: diff --git a/workflow/Snakefile b/workflow/Snakefile index feedc43..d27f971 100644 --- a/workflow/Snakefile +++ b/workflow/Snakefile @@ -235,6 +235,11 @@ use rule picard_collect_multiple_metrics from qc as qc_picard_collect_multiple_m extra=lambda wildcards, input: f" INTERVALS={input.intervals}", +use rule samtools_stats from qc as qc_samtools_stats with: + params: + extra="%s " % (config.get("samtools_stats", {}).get("extra", ""),), + + use rule multiqc from qc as qc_multiqc with: input: files=lambda wildcards: set(