Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to implement EM using kb? #51

Open
jc271828 opened this issue Mar 17, 2023 · 2 comments
Open

how to implement EM using kb? #51

jc271828 opened this issue Mar 17, 2023 · 2 comments

Comments

@jc271828
Copy link

Hi,

I was wondering how/if I can choose EM algorithm or the "simpler" multimapping option that distributes reads evenly across genes when using kb to count reads. And because my experiment was done using 10x Genomics technology (grabbing sequences adjacent to the polyA tail), are reads supposedly very 3' end biased? If so, I also wonder if the EM algorithm can accurately distribute reads that are mapped to Gene A's 3' end and Gene B's 5' end. As far as I'm imagining it, those reads are more "likely" from Gene A transcripts? Thanks for your time!

Jingxian

@Yenaled
Copy link

Yenaled commented Mar 17, 2023

Yeah, you can choose those options (see kb count which supports both). I haven't really seen a benefit for that though (with everything being 3' end, you can't really resolve ambiguities like you can with bulk data).

As for your question about the EM algorithm, no, that is not supported. There are many things to consider in order for such a model to work (internal polyA tracts, mapping location distribution and modeling fragments, etc.) and we're unsure of how much value we'd actually gain from fitting such models. We hope to look into it at some point though

@jc271828
Copy link
Author

Thank you for such a timely response! That makes sense. I guess how much benefit can be gained from developing a better-fitting model may partially depend on how "overlapping"/"adjacent" genes are in the reference genome. I'm working with C. elegans and the current version annotation I'm working with has like over 10% genes overlapping. Hmm.. so I guess I'll probably not worry about this too much for now but really look forward to seeing future workarounds on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants