Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how is calculate valid_coverage ? Different results between total read or subsample reads #292

Open
DelphIONe opened this issue Oct 29, 2024 · 1 comment
Labels
question Looking for clarification on inputs and/or outputs

Comments

@DelphIONe
Copy link

I have mapped reads on my genome. I used modkit pileup and dmr pair using the IVT sample. If I sub-sample these reads and re-run modkit pileup and then dmr I get strange valid_coverage. For example, for one position, the valid_coverage on my sub-sample of reads is higher than the same position in the result obtained by modkit dmr with the total reads (with the same IVT reads). How is this possible? How is valid_coverage calculated?

Thanks a lot for your help,

@ArtRand
Copy link
Contributor

ArtRand commented Oct 31, 2024

Hello @DelphIONe,

The valid_coverage is the number of base moficiation calls (modified of any class or unmodified) that pass the filter threshold probability. For more details, the documentation has some worked examples. You will likely see different values of valid coverage when you subset your data since you have a different number of reads and the dynamic threshold estimation will derive a different value. You can run modkit sample-probs (documentation here) to find a threshold value for a given set of reads then use it for subsequent experiments.

@ArtRand ArtRand added the question Looking for clarification on inputs and/or outputs label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Looking for clarification on inputs and/or outputs
Projects
None yet
Development

No branches or pull requests

2 participants