From 157d5306ef7bf3d4f9e24d1d4e0f1802fd1d242d Mon Sep 17 00:00:00 2001 From: Art Rand Date: Tue, 5 Mar 2024 00:40:07 +0000 Subject: [PATCH] update changelog 0.2.5 and build docs for 0.2.5 --- CHANGELOG.md | 12 + book/src/advanced_usage.md | 103 +++-- docs/404.html | 55 ++- docs/advanced_usage.html | 202 ++++++--- docs/algo_details.html | 61 ++- docs/book.js | 67 +-- docs/collapse.html | 59 ++- docs/css/chrome.css | 137 ++++-- docs/css/general.css | 51 ++- docs/css/print.css | 10 +- docs/css/variables.css | 24 + docs/dmr_scoring_details.html | 321 ++++++++++++++ docs/filtering.html | 67 ++- docs/filtering_details.html | 59 ++- docs/filtering_numeric_details.html | 61 ++- docs/highlight.js | 49 ++- docs/images/beta_distributions.png | Bin 0 -> 40805 bytes docs/images/estimated_map_pvalue2.png | Bin 0 -> 31749 bytes docs/index.html | 61 ++- docs/intro_adjust.html | 63 ++- docs/intro_bedmethyl.html | 73 +++- docs/intro_call_mods.html | 59 ++- docs/intro_dmr.html | 245 ++++++----- docs/intro_edge_filter.html | 65 ++- docs/intro_extract.html | 89 ++-- docs/intro_include_bed.html | 61 ++- docs/intro_motif_bed.html | 59 ++- docs/intro_pileup_hemi.html | 95 ++-- docs/intro_repair.html | 61 ++- docs/intro_summary.html | 59 ++- docs/intro_validate.html | 61 ++- docs/limitations.html | 64 ++- docs/perf_considerations.html | 61 ++- docs/print.html | 608 +++++++++++++++++--------- docs/quick_start.html | 61 ++- docs/searcher.js | 2 +- docs/searchindex.js | 2 +- docs/searchindex.json | 2 +- docs/tomorrow-night.css | 4 +- docs/troubleshooting.html | 71 ++- generate_advanced_usage.sh | 2 +- src/dmr/subcommands.rs | 3 +- src/extract/subcommand.rs | 2 +- 43 files changed, 2398 insertions(+), 873 deletions(-) create mode 100644 docs/dmr_scoring_details.html create mode 100644 docs/images/beta_distributions.png create mode 100644 docs/images/estimated_map_pvalue2.png diff --git a/CHANGELOG.md b/CHANGELOG.md index 7140613..f4e650c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [v0.2.5] +### Fixes +- [extract] Only emit mapped reads when `--region` is provided, but still emit unmapped bases in those reads unless `--mapped-only` is passed. +- [extract] Performance improvement due to better tracking of interval boundaries. +- [repair] Updates the `MN` tag on repaired records. +### Adds +- [dmr, single-site] Refactor `dmr pair` without regions (i.e. single site analysis) to increase performance. +- [dmr, single-site] Add estimated MAP-based p-value to output. +- [all] Allows BED3 input for all options that use `--include-bed`. Strand will be assumed to be BOTH (equivalent to '.'). +- [extract] Increases the kmer size limit to 50. + + ## [v0.2.5-rc2] ### Fixes - [all] Reads with entirely implicit canonical calls are no longer skipped for "modbase info empty" or similar. diff --git a/book/src/advanced_usage.md b/book/src/advanced_usage.md index c07ca5a..4334c4a 100644 --- a/book/src/advanced_usage.md +++ b/book/src/advanced_usage.md @@ -9,6 +9,8 @@ modified base information (modBAMs). The various sub-commands and tools availabl ```text +Modkit is a bioinformatics tool for working with modified bases from Oxford Nanopore + Usage: modkit Commands: @@ -764,8 +766,8 @@ Options: Hide the progress bar. --kmer-size - Set the query and reference k-mer size (if a reference is provided). Maxumum number for - this value is 12. + Set the query and reference k-mer size (if a reference is provided). Maximum number for + this value is 50. [default: 5] @@ -951,19 +953,19 @@ Options: Canonical base to evaluate. By default, this will be derived from mod codes in ground truth BED files. For ground truth with only canonical sites and/or ChEBI codes this values must be set. - + [possible values: A, C, G, T] -q, --filter-quantile Filter out modified base calls where the probability of the predicted variant is below this confidence percentile. For example, 0.1 will filter out the 10% lowest confidence modification calls. - + [default: 0.1] -t, --threads Number of threads to use. - + [default: 4] --suppress-progress @@ -1153,7 +1155,7 @@ Options: Print help information (use `-h` for a summary). ``` -## pileup-hemi `pair` +## dmr pair ```text Compare regions in a pair of samples (for example, tumor and normal or control and experiment). A sample is input as a bgzip pileup bedMethyl (produced by pileup, for example) that has an associated @@ -1167,66 +1169,116 @@ Options: Bgzipped bedMethyl file for the first (usually control) sample. There should be a tabix index with the same name and .tbi next to this file or the --index-a option must be provided. + -b Bgzipped bedMethyl file for the second (usually experimental) sample. There should be a tabix index with the same name and .tbi next to this file or the --index-b option must be provided. + -o, --out-path - Path to file to direct output, optional, no argument will direct output to stdout. + Path to file to direct output, optional, no argument will direct output to stdout + + --header + Include header in output. + -r, --regions-bed BED file of regions over which to compare methylation levels. Should be tab-separated (spaces allowed in the "name" column). Requires chrom, chromStart and chromEnd. The Name column is optional. Strand is currently ignored. When omitted, methylation levels are - compared at each site in the `-a`/`control_bed_methyl` BED file (or optionally, the - `-b`/`exp_bed_methyl` file with the `--use-b` flag. - --use-b - When performing site-level DMR, use the bedMethyl indicated by the -b/exp_bed_methyl - argument to collect bases to score. + compared at each site. + --ref - Path to reference fasta for the pileup. + Path to reference fasta for used in the pileup/alignment. + -m Bases to use to calculate DMR, may be multiple. For example, to calculate differentially methylated regions using only cytosine modifications use --base C. + --log-filepath File to write logs to, it's recommended to use this option. + -t, --threads - Number of threads to use [default: 4] + Number of threads to use. + + [default: 4] + --batch-size Control the batch size. The batch size is the number of regions to load at a time. Each region will be processed concurrently. Loading more regions at a time will decrease IO to load data, but will use more memory. Default will be 50% more than the number of threads assigned. + -k, --mask Respect soft masking in the reference FASTA. + --suppress-progress Don't show progress bars. + -f, --force Force overwrite of output file, if it already exists. + --index-a Path to tabix index associated with -a (--control-bed-methyl) bedMethyl file. + --index-b Path to tabix index associated with -b (--exp-bed-methyl) bedMethyl file. + --missing How to handle regions found in the `--regions` BED file. quiet => ignore regions that are not found in the tabix header warn => log (debug) regions that are missing fatal => log - (error) and exit the program when a region is missing. [default: warn] [possible values: - quiet, warn, fail] + (error) and exit the program when a region is missing. + + [default: warn] + [possible values: quiet, warn, fail] + --min-valid-coverage Minimum valid coverage required to use an entry from a bedMethyl. See the help for pileup - for the specification and description of valid coverage. [default: 0] + for the specification and description of valid coverage. + + [default: 0] + + --prior + Prior distribution for estimating MAP-based p-value. Should be two arguments for alpha and + beta (e.g. 1.0 1.0). See `dmr_scoring_details.md` for additional details on how the metric + is calculated. + + --delta + Consider only effect sizes greater than this when calculating the MAP-based p-value. + + [default: 0.05] + + -N, --n-sample-records + Sample this many reads when estimating the max coverage thresholds. + + [default: 10042] + + --max-coverages + Max coverages to enforce when calculating estimated MAP-based p-value. + + --cap-coverages + When using replicates, cap coverage to be equal to the maximum coverage for a single + sample. For example, if there are 3 replicates with max_coverage of 30, the total coverage + would normally be 90. Using --cap-coverages will down sample the data to 30X. + + -i, --interval-size + Interval chunk size in base pairs to process concurrently. Smaller interval chunk sizes + will use less memory but incur more overhead. + + [default: 100000] + -h, --help - Print help information. + Print help information (use `-h` for a summary). ``` -## pileup-hemi +## dmr multi ```text Compare regions between all pairs of samples (for example a trio sample set or haplotyped trio sample set). As with `pair` all inputs must be bgzip compressed bedMethyl files with associated tabix indices. Each sample must be assigned a name. Output is a directory of BED files with the score column indicating the magnitude of the difference in methylation between the two samples -indicated in the file name. See the online documentation for additional details. +indicated in the file name. See the online documentation for additional details -Usage: modkit dmr multi [OPTIONS] --out-dir --ref +Usage: modkit dmr multi [OPTIONS] --regions-bed --out-dir --ref Options: -s, --sample @@ -1239,8 +1291,9 @@ Options: -r, --regions-bed BED file of regions over which to compare methylation levels. Should be tab-separated (spaces allowed in the "name" column). Requires chrom, chromStart and chromEnd. The Name - column is optional. Strand is currently ignored. When omitted, methylation levels are - compared at each site in common between the two bedMethyl files being compared. + column is optional. Strand is currently ignored. + --header + Include header in output. -o, --out-dir Directory to place output DMR results in BED format. -p, --prefix @@ -1259,7 +1312,7 @@ Options: --suppress-progress Don't show progress bars. -f, --force - Force overwrite of output file, if it already exists + Force overwrite of output file, if it already exists. --missing How to handle regions found in the `--regions` BED file. quiet => ignore regions that are not found in the tabix header warn => log (debug) regions that are missing fatal => log @@ -1269,5 +1322,5 @@ Options: Minimum valid coverage required to use an entry from a bedMethyl. See the help for pileup for the specification and description of valid coverage. [default: 0] -h, --help - Print help information + Print help information. ``` diff --git a/docs/404.html b/docs/404.html index 394071d..5a94111 100644 --- a/docs/404.html +++ b/docs/404.html @@ -1,5 +1,5 @@ - + @@ -11,7 +11,7 @@ - + @@ -35,7 +35,7 @@ - +
+ + + + +
-