how is calculate valid_coverage ? Different results between total read or subsample reads #292

DelphIONe · 2024-10-29T18:10:33Z

I have mapped reads on my genome. I used modkit pileup and dmr pair using the IVT sample. If I sub-sample these reads and re-run modkit pileup and then dmr I get strange valid_coverage. For example, for one position, the valid_coverage on my sub-sample of reads is higher than the same position in the result obtained by modkit dmr with the total reads (with the same IVT reads). How is this possible? How is valid_coverage calculated?

Thanks a lot for your help,

ArtRand · 2024-10-31T00:22:55Z

Hello @DelphIONe,

The valid_coverage is the number of base moficiation calls (modified of any class or unmodified) that pass the filter threshold probability. For more details, the documentation has some worked examples. You will likely see different values of valid coverage when you subset your data since you have a different number of reads and the dynamic threshold estimation will derive a different value. You can run modkit sample-probs (documentation here) to find a threshold value for a given set of reads then use it for subsequent experiments.

ArtRand added the question Looking for clarification on inputs and/or outputs label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how is calculate valid_coverage ? Different results between total read or subsample reads #292

how is calculate valid_coverage ? Different results between total read or subsample reads #292

DelphIONe commented Oct 29, 2024

ArtRand commented Oct 31, 2024

how is calculate valid_coverage ? Different results between total read or subsample reads #292

how is calculate valid_coverage ? Different results between total read or subsample reads #292

Comments

DelphIONe commented Oct 29, 2024

ArtRand commented Oct 31, 2024