Merge branch 'ar/prep-026-release' into 'master'

[release] Update docs and changelog for 0.2.6 release See merge request machine-learning/modkit!158
nanoporetech · Mar 15, 2024 · 7f6dd3a · 7f6dd3a
2 parents d31a61a + 7a229d8
commit 7f6dd3a
Show file tree

Hide file tree

Showing 7 changed files with 52 additions and 6 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,14 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [v0.2.6]
+### Fixes
+- [dmr, single-site] Don't require that there are equal numbers of samples for single site DMR with multiple samples. Fixes #140.
+- [dmr, pairwise, region] Protect when zero bedmethyl records are found for a region, fixes #146.
+### Adds
+- [validate] Adds on-the-fly filtering of reads by alignment identity and/or alignment length.
+
+
 ## [v0.2.5]
 ### Fixes
 - [extract] Only emit mapped reads when `--region` is provided, but still emit unmapped bases in those reads unless `--mapped-only` is passed.

diff --git a/book/src/advanced_usage.md b/book/src/advanced_usage.md
@@ -956,6 +956,12 @@ Options:
           
           [possible values: A, C, G, T]
 
+      --min-identity <MIN_ALIGNMENT_IDENTITY>
+          Only use reads with alignment identity >= this number, in Q-space (phred score).
+
+      --min-length <MIN_ALIGNMENT_LENGTH>
+          Remove reads with fewer aligned reference bases than this threshold.
+
   -q, --filter-quantile <FILTER_QUANTILE>
           Filter out modified base calls where the probability of the predicted variant is below
           this confidence percentile. For example, 0.1 will filter out the 10% lowest confidence

diff --git a/docs/advanced_usage.html b/docs/advanced_usage.html
@@ -1111,6 +1111,12 @@ <h2 id="validate"><a class="header" href="#validate">validate</a></h2>
 
           [possible values: A, C, G, T]
 
+      --min-identity &lt;MIN_ALIGNMENT_IDENTITY&gt;
+          Only use reads with alignment identity &gt;= this number, in Q-space (phred score).
+
+      --min-length &lt;MIN_ALIGNMENT_LENGTH&gt;
+          Remove reads with fewer aligned reference bases than this threshold.
+
   -q, --filter-quantile &lt;FILTER_QUANTILE&gt;
           Filter out modified base calls where the probability of the predicted variant is below
           this confidence percentile. For example, 0.1 will filter out the 10% lowest confidence

diff --git a/docs/intro_dmr.html b/docs/intro_dmr.html
@@ -346,7 +346,8 @@ <h2 id="differential-methylation-output-format"><a class="header" href="#differe
 <tr><td>19</td><td>per-replicate effect sizes</td><td>effect sizes matched replicate pairs</td><td>float</td></tr>
 </tbody></table>
 </div>
-<p>Columns 16-19 are only produced when multiple replicates are provided. Columns 18 and 19 have the replicate pairwise MAP-based p-values and effect sizes which are calculated based on their order provided on the command line.
+<p>Columns 16-19 are only produced when an equal number of replicates are provided.
+Columns 18 and 19 have the replicate pairwise MAP-based p-values and effect sizes which are calculated based on their order provided on the command line.
 For example in the abbreviated command below:</p>
 <pre><code class="language-bash">modkit dmr pair \
   -a ${norm_pileup_1}.gz \
@@ -356,7 +357,16 @@ <h2 id="differential-methylation-output-format"><a class="header" href="#differe
   ...
 </code></pre>
 <p>Column 18 will contain the MAP-based p-value comparing <code>norm_pileup_1</code> versus <code>tumor_pileup_1</code> and <code>norm_pileup_2</code> versus <code>norm_pileup_2</code>.
-Column 19 will contain the effect sizes, values are comma-separated.</p>
+Column 19 will contain the effect sizes, values are comma-separated.
+If you have a different number of samples for each condition, such as:</p>
+<pre><code class="language-bash">modkit dmr pair \
+  -a ${norm_pileup_1}.gz \
+  -a ${norm_pileup_2}.gz \
+  -a ${norm_pileup_3}.gz \
+  -b ${tumor_pileup_1}.gz \
+  -b ${tumor_pileup_2}.gz \
+</code></pre>
+<p>these columns will not be present.</p>
 
                     </main>
 

diff --git a/docs/print.html b/docs/print.html
@@ -938,7 +938,8 @@ <h2 id="differential-methylation-output-format"><a class="header" href="#differe
 <tr><td>19</td><td>per-replicate effect sizes</td><td>effect sizes matched replicate pairs</td><td>float</td></tr>
 </tbody></table>
 </div>
-<p>Columns 16-19 are only produced when multiple replicates are provided. Columns 18 and 19 have the replicate pairwise MAP-based p-values and effect sizes which are calculated based on their order provided on the command line.
+<p>Columns 16-19 are only produced when an equal number of replicates are provided.
+Columns 18 and 19 have the replicate pairwise MAP-based p-values and effect sizes which are calculated based on their order provided on the command line.
 For example in the abbreviated command below:</p>
 <pre><code class="language-bash">modkit dmr pair \
   -a ${norm_pileup_1}.gz \
@@ -948,7 +949,16 @@ <h2 id="differential-methylation-output-format"><a class="header" href="#differe
   ...
 </code></pre>
 <p>Column 18 will contain the MAP-based p-value comparing <code>norm_pileup_1</code> versus <code>tumor_pileup_1</code> and <code>norm_pileup_2</code> versus <code>norm_pileup_2</code>.
-Column 19 will contain the effect sizes, values are comma-separated.</p>
+Column 19 will contain the effect sizes, values are comma-separated.
+If you have a different number of samples for each condition, such as:</p>
+<pre><code class="language-bash">modkit dmr pair \
+  -a ${norm_pileup_1}.gz \
+  -a ${norm_pileup_2}.gz \
+  -a ${norm_pileup_3}.gz \
+  -b ${tumor_pileup_1}.gz \
+  -b ${tumor_pileup_2}.gz \
+</code></pre>
+<p>these columns will not be present.</p>
 <div style="break-before: page; page-break-before: always;"></div><h1 id="validating-ground-truth-results"><a class="header" href="#validating-ground-truth-results">Validating ground truth results.</a></h1>
 <p>The <code>modkit validate</code> sub-command is intended for validating results in a uniform manner from samples with known modified base content. Specifically the modified base status at any annotated reference location should be known.</p>
 <h2 id="validating-from-modbam-reads-and-bed-reference-annotation"><a class="header" href="#validating-from-modbam-reads-and-bed-reference-annotation">Validating from modBAM reads and BED reference annotation.</a></h2>
@@ -1959,6 +1969,12 @@ <h2 id="validate"><a class="header" href="#validate">validate</a></h2>
 
           [possible values: A, C, G, T]
 
+      --min-identity &lt;MIN_ALIGNMENT_IDENTITY&gt;
+          Only use reads with alignment identity &gt;= this number, in Q-space (phred score).
+
+      --min-length &lt;MIN_ALIGNMENT_LENGTH&gt;
+          Remove reads with fewer aligned reference bases than this threshold.
+
   -q, --filter-quantile &lt;FILTER_QUANTILE&gt;
           Filter out modified base calls where the probability of the predicted variant is below
           this confidence percentile. For example, 0.1 will filter out the 10% lowest confidence

diff --git a/docs/searchindex.js b/docs/searchindex.js
diff --git a/docs/searchindex.json b/docs/searchindex.json