Skip to content

Commit

Permalink
Merge branch 'ar/docs-typos-030' into 'master'
Browse files Browse the repository at this point in the history
[docs] Fix some typos and clean up.

See merge request machine-learning/modkit!185
  • Loading branch information
ArtRand committed May 29, 2024
2 parents f9320ae + 71c78a5 commit 049122b
Show file tree
Hide file tree
Showing 37 changed files with 126 additions and 125 deletions.
2 changes: 1 addition & 1 deletion book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@
- [Extracting read information to a table](./intro_extract.md)
- [Calling mods in a modBAM](./intro_call_mods.md)
- [Removing modification calls at the ends of reads](./intro_edge_filter.md)
- [Narrow output to specific positions](./intro_include_bed.md)
- [Repair MM/ML tags on trimmed reads](./intro_repair.md)
- [Make hemi-methylation bedMethyl tables](./intro_pileup_hemi.md)
- [Perform differential methylation scoring](./intro_dmr.md)
- [Validate ground truth results](./intro_validate.md)
- [Find highly modified motif sequences](./intro_find_motifs.md)
- [Calculating methylation entropy](./intro_entropy.md)
- [Narrow output to specific positions](./intro_include_bed.md)
- [Extended subcommand help](./advanced_usage.md)
- [Troubleshooting](./troubleshooting.md)
- [Current limitations](./limitations.md)
Expand Down
18 changes: 9 additions & 9 deletions book/src/intro_bedmethyl.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,20 +163,20 @@ CG->CH substitution such that no modification call was produced by the basecalle
| 2 | start position | 0-based start position | int |
| 3 | end position | 0-based exclusive end position | int |
| 4 | modified base code and motif | single letter code for modified base and motif when more than one motif is used | str |
| 5 | score | Equal to N<sub>valid_cov</sub>. | int |
| 5 | score | equal to N<sub>valid_cov</sub> | int |
| 6 | strand | '+' for positive strand '-' for negative strand, '.' when strands are combined | str |
| 7 | start position | included for compatibility | int |
| 8 | end position | included for compatibility | int |
| 9 | color | included for compatibility, always 255,0,0 | str |
| 10 | N<sub>valid_cov</sub> | See definitions above. | int |
| 10 | N<sub>valid_cov</sub> | see definitions above. | int |
| 11 | percent modified | (N<sub>mod</sub> / N<sub>valid_cov</sub>) * 100 | float |
| 12 | N<sub>mod</sub> | See definitions above. | int |
| 13 | N<sub>canonical</sub> | See definitions above. | int |
| 14 | N<sub>other_mod</sub> | See definitions above. | int |
| 15 | N<sub>delete</sub> | See definitions above. | int |
| 16 | N<sub>fail</sub> | See definitions above. | int |
| 17 | N<sub>diff</sub> | See definitions above. | int |
| 18 | N<sub>nocall</sub> | See definitions above. | int |
| 12 | N<sub>mod</sub> | see definitions above | int |
| 13 | N<sub>canonical</sub> | see definitions above | int |
| 14 | N<sub>other_mod</sub> | see definitions above | int |
| 15 | N<sub>delete</sub> | see definitions above | int |
| 16 | N<sub>fail</sub> | see definitions above | int |
| 17 | N<sub>diff</sub> | see definitions above | int |
| 18 | N<sub>nocall</sub> | see definitions above | int |

## Performance considerations

Expand Down
7 changes: 4 additions & 3 deletions book/src/intro_dmr.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,11 +195,11 @@ When performing single-site analysis, the following additional columns are added
|--------|----------------------------|---------------------------------------------------------------------------------------|-------|
| 14 | MAP-based p-value | ratio of the posterior probability of observing the effect size over zero effect size | float |
| 15 | effect size | percent modified in sample A (col 12) minus percent modified in sample B (col 13) | float |
| 16 | balanced MAP-based p-value | mAP-based p-value when all replicates are balanced | float |
| 16 | balanced MAP-based p-value | MAP-based p-value when all replicates are balanced | float |
| 17 | balanced effect size | effect size when all replicates are balanced | float |
| 18 | pct_a_samples | percent of 'a' samples used in statistical test | float |
| 19 | pct_b_samples | percent of 'b' samples used in statistical test | float |
| 20 | per-replicate p-values | mAP-based p-values for matched replicate pairs | float |
| 20 | per-replicate p-values | MAP-based p-values for matched replicate pairs | float |
| 21 | per-replicate effect sizes | effect sizes matched replicate pairs | float |


Expand Down Expand Up @@ -257,6 +257,7 @@ modkit dmr pair \

The default settings for the HMM are to run in "coarse-grained" mode which will more eagerly join neighboring sites, potentially at the cost of including sites that are not differentially modified within "Different" blocks.
To activate "fine-grained" mode, pass the `--fine-grained` flag.

The output schema for the segments is:

| column | name | description | type |
Expand All @@ -266,7 +267,7 @@ The output schema for the segments is:
| 3 | end position | 0-based exclusive end position, from `--regions` argument | int |
| 4 | state-name | "different" when sites are differentially modified, "same" otherwise | str |
| 5 | score | difference score, more positive values have increased difference | float |
| 6 | N_<sub>sites<\sub> | number of sites (bedmethyl records) in the segment | float |
| 6 | N-sites | number of sites (bedmethyl records) in the segment | float |
| 7 | sample<sub>a</sub> counts | counts of each base modification in the region, comma-separated, for sample A | str |
| 8 | sample<sub>a</sub> total | total number of base modification calls in the region, including unmodified, for sample A | str |
| 9 | sample<sub>b</sub> counts | counts of each base modification in the region, comma-separated, for sample B | str |
Expand Down
2 changes: 1 addition & 1 deletion book/src/intro_edge_filter.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ All commands have the flag `--invert-edge-filter` that will _keep_ only base mod

## Example usages

### call mods with the estimated threshold and ignore modification calls within 100 base pairs of the ends of the reads
### Call mods with the estimated threshold and ignore modification calls within 100 base pairs of the ends of the reads
```
modkit call-mods <in.bam> <out.bam> --edge-filter 100
```
Expand Down
2 changes: 1 addition & 1 deletion book/src/intro_find_motifs.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The human-readable tables are always output to the log and terminal, the machine
| column | name | description | type |
|--------|------------|--------------------------------------------------------------------------------------------------------------------------|-------|
| 1 | mod_code | code specifying the modification found in the motif | str |
0 2 | motif | sequence of identified motif using [IUPAC](https://www.bioinformatics.org/sms/iupac.html) codes | str |
| 2 | motif | sequence of identified motif using [IUPAC](https://www.bioinformatics.org/sms/iupac.html) codes | str |
| 3 | offset | 0-based offset into the motif sequence of the modified base | int |
| 4 | frac_mod | fraction of time this sequence is found in the _high modified_ set col-5 / (col-5 + col-6) | float |
| 5 | high_count | number of occurances of this sequence in the _high-modified_ set | int |
Expand Down
18 changes: 9 additions & 9 deletions book/src/intro_pileup_hemi.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,20 +84,20 @@ patterns (`-,-`). All patterns recognized at a location will be reported in the
| 2 | start position | 0-based start position | int |
| 3 | end position | 0-based exclusive end position | int |
| 4 | methylation pattern | comma-separated pair of modification codes `-` means canonical, followed by the primary read base | str |
| 5 | score | Equal to N<sub>valid_cov</sub>. | int |
| 5 | score | equal to N<sub>valid_cov</sub> | int |
| 6 | strand | always '.' because strand information is combined | str |
| 7 | start position | included for compatibility | int |
| 8 | end position | included for compatibility | int |
| 9 | color | included for compatibility, always 255,0,0 | str |
| 10 | N<sub>valid_cov</sub> | See definitions above. | int |
| 10 | N<sub>valid_cov</sub> | see definitions above | int |
| 11 | fraction modified | N<sub>pattern</sub> / N<sub>valid_cov</sub> | float |
| 12 | N<sub>pattern</sub> | See definitions above. | int |
| 13 | N<sub>canonical</sub> | See definitions above. | int |
| 14 | N<sub>other_pattern</sub> | See definitions above. | int |
| 15 | N<sub>delete</sub> | See definitions above. | int |
| 16 | N<sub>fail</sub> | See definitions above. | int |
| 17 | N<sub>diff</sub> | See definitions above. | int |
| 18 | N<sub>nocall</sub> | See definitions above. | int |
| 12 | N<sub>pattern</sub> | see definitions above | int |
| 13 | N<sub>canonical</sub> | see definitions above | int |
| 14 | N<sub>other_pattern</sub> | see definitions above | int |
| 15 | N<sub>delete</sub> | see definitions above | int |
| 16 | N<sub>fail</sub> | see definitions above | int |
| 17 | N<sub>diff</sub> | see definitions above | int |
| 18 | N<sub>nocall</sub> | see definitions above | int |


## Limitations
Expand Down
4 changes: 2 additions & 2 deletions book/src/intro_summary.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Summarizing a modBAM.

The `modkit summary` sub-command is intended for collecting read-level statistics on
either a sample of reads, a region, or an entire modBam.
The `modkit summary` sub-command is intended for collecting read-level statistics on either a sample of reads, a region, or an entire modBam.
It is important to note that the default behavior of `modkit summary` is to take a sample of the reads to get a quick estimate.

## Summarize the base modification calls in a modBAM.

Expand Down
Loading

0 comments on commit 049122b

Please sign in to comment.