Merge pull request #38 from clinical-genomics-uppsala/develop

refactor: remove SampleSheet and misc for "new" stackstorm
clinical-genomics-uppsala · Apr 24, 2024 · d45b5f1 · d45b5f1
2 parents e4ca12c + ce19c63
commit d45b5f1
Show file tree

Hide file tree

Showing 17 changed files with 390 additions and 403 deletions.
diff --git a/.tests/integration/SampleSheet.csv b/.tests/integration/SampleSheet.csv
diff --git a/README.md b/README.md
@@ -32,17 +32,22 @@ The workflow repository contains a small test dataset (:exclamation: Todo: as of
 
 ```bash
 $ cd .tests/integration
-$ snakemake -n -s ../../workflow/Snakefile --configfiles ../../config/config.yaml config.yaml --config sequenceid="990909_test"
+$ snakemake -n -s ../../workflow/Snakefile --configfiles ../../config/config.yaml config.yaml --config sequenceid="990909_test" PATH_TO_REPO=/folder/containing/marple_rd_tc/
 ```
+> **_NOTE:_**   If using the variable `PATH_TO_REPO` in the config-file this need to be defined in the commandline
+
 
 ## :rocket: [Usage](https://marple-rd-tc.readthedocs.io/en/latest/running/)
 
 To use this run this pipeline `sample.tsv`, `units.tsv`, `resources.yaml`, and `config.yaml` files need to be available in the current directory (or otherwise specified in `config.yaml`). You always need to specify the `config`-file and `sequenceid` variable in the command. To run the pipeline:
 
 ```bash
-$ snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="990909_test" -s /path/to/marple_rd_tc/workflow/Snakefile
+$ snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="990909_test" -s /path/to/marple_rd_tc/workflow/Snakefile --config PATH_TO_REPO=/folder/containing/marple_rd_tc/
 ```
 
+> **_NOTE:_**   If using the variable `PATH_TO_REPO` in the config this need to be defined in the commandline
+
+
 ## :books: [Output files](https://marple-rd-tc.readthedocs.io/en/latest/result_files/)
 
 The following output files are located in `Results/`-folder:

diff --git a/config/config.yaml b/config/config.yaml
@@ -2,7 +2,7 @@
 resources: "resources.yaml"
 samples: "samples.tsv"
 units: "units.tsv"
-output: "/projects/wp3/nobackup/TwistCancer/Bin/marple_rd_tc/config/output_files.yaml"
+output: "{{PATH_TO_REPO}}/marple_rd_tc/config/output_files.yaml"
 
 default_container: "docker://hydragenetics/common:1.8.1"
 
@@ -67,7 +67,7 @@ multiqc:
   reports:
     DNA:
       included_unit_types: ["T", "N"]
-      config: "/projects/wp3/nobackup/TwistCancer/Bin/marple_rd_tc/config/multiqc_config.yaml"
+      config: "{{PATH_TO_REPO}}/marple_rd_tc/config/multiqc_config.yaml"
       qc_files:
         - "prealignment/fastp_pe/{sample}_{type}_{flowcell}_{lane}_{barcode}_fastp.json"
         - "qc/fastqc/{sample}_{type}_{flowcell}_{lane}_{barcode}_{read}_fastqc.zip"
@@ -119,9 +119,6 @@ picard_collect_multiple_metrics:
 picard_mark_duplicates:
   container: "docker://hydragenetics/picard:2.25.4"
 
-sample_order_multiqc:
-  sample_sheet: "SampleSheet.csv"
-
 vep:
   container: "docker://ensemblorg/ensembl-vep:release_109.3" # "docker://hydragenetics/vep:109"
   vep_cache: "/data/ref_genomes/VEP"

diff --git a/docs/includes/images/qc.dot b/docs/includes/images/qc.dot
@@ -18,11 +18,9 @@ digraph snakemake_dag {
 	p_align[label = "qc_picard_collect_alignment_summary_metrics", color = "0.29 0.6 0.85", style="rounded"];
 	p_dup[label = "qc_picard_collect_duplication_metrics", color = "0.34 0.6 0.85", style="rounded"];
 	sampleorder[label = "sample_order_multiqc", color = "0.00 0.6 0.85", style="rounded"];
-	samplesheet[label = "SampleSheet.csv", color = "0.0 0.0 0.0", style="dotted"];
 
 	multiqc -> multiqc_html
 	sampleorder -> multiqc
-	samplesheet -> sampleorder
 	fastp -> multiqc
 	fastp -> bam [style="dotted", label = "alignment", fontcolor = "grey50", fontsize=9, fontname=sans ]
 	p_gc -> multiqc

diff --git a/docs/includes/images/qc.png b/docs/includes/images/qc.png
diff --git a/docs/result_files.md b/docs/result_files.md
@@ -33,7 +33,7 @@ The report is configured based on a MultiQC config file.
 ///
 
 ### General Statistics
-The general statistics table are ordered based on the sample order in `SampleSheet.csv`, this is done by renaming the samples in two steps using the script `sample_order_multiqc.py`. To toggle between "Sample Order" and "Sample Name" use the buttons just above General Stats header.
+The general statistics table are ordered based on the fastq-file  "S"-index, e.g. `sampleT_S1_R1_001.fastq.gz` will be before `sampleA_S2_R1_001.fastq.gz`. This is done by renaming the samples in two steps using the script `sample_order_multiqc.py`. To toggle between "Sample Order" and "Sample Name" use the buttons just above General Stats header.
 
 <br />
 

diff --git a/docs/running.md b/docs/running.md
@@ -37,7 +37,7 @@ git clone --branch ${VERSION} https://github.com/clinical-genomics-uppsala/marpl
 To run the Marple pipeline a python virtual environment is needed. Create a virtual environment and then install pipeline requirements specified in `requirements.txt`.
 ```bash
 # Create a new virtual environment
-python3 -m venv ${WORKING_DIRECTORY}/virtual/environment
+python3.9 -m venv ${WORKING_DIRECTORY}/virtual/environment
 
 # Enter working directory
 cd ${WORKING_DIRECTORY}
@@ -88,7 +88,6 @@ An `resources.yaml` file can also be found in the `config/`-folder. This is adap
 source virtual/environment/bin/activate
 
 # Run snakemake command with the extra config parameter called sequenceid
-snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="230202-test" -s /path/to/marple/workflow/Snakefile
-
+snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="230202-test" -s /path/to/marple/workflow/Snakefile --config PATH_TO_REPO=/path/to/repo/
 ```
 
diff --git a/docs/running_ref.md b/docs/running_ref.md
@@ -6,9 +6,12 @@ To generate `.bam` **and** `.bai`-files for all samples you need to run Marple u
 
 ```bash
 # Run snakemake command with the extra config parameter called sequenceid
-snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="normal_samples" -s /path/to/marple/workflow/Snakefile --no-temp --until qc_mosdepth_bed
+snakemake --profile snakemakeprofile --configfile config.yaml --config sequenceid="normal_samples" -s /path/to/marple/workflow/Snakefile --no-temp --until qc_mosdepth_bed --config PATH_TO_REPO=/folder/containing/marple_rd_tc/
 
 ```
+
+> **_NOTE:_**   If using the variable `PATH_TO_REPO` (folder containing `marple_rd_tc`) in the config-file this need to be defined in the commandline
+
 ### :books: Input files 
 Four different files need to be available in your runfolder and to be adapted to your compute-environment and sequence run; `samples.tsv`, `units_references.tsv`, `config_references.yaml` and `resources.yaml`.
 #### Samples and Units

diff --git a/docs/softwares.md b/docs/softwares.md
@@ -107,7 +107,7 @@ Rules that creates a `.xlsx` file per sample with aggregated coverage informatio
 ---
 
 ## sample_order_multiqc.smk
-A python script to create sample_replacement and sample_order files to be used in MultiQC to order samples based on order in SampleSheet.csv 
+A python script to create sample_replacement and sample_order files to be used in MultiQC to order samples based on order of the "S"-index in the samplenames.
 
 ### :snake: Rule