Feasibility of including deduplicated alignments #109

ijhoskins · 2020-07-09T16:04:53Z

I see that the dropEst program reports a matrix of counts for genes in the input GTF. Would it be feasible to support output of a deduplicated BAM as well? I am not interested in scRNA-seq counts but rather the ability of your pipeline to identify and deduplicate erroneous UMIs for other applications. I realize this may be out-of-scope but your pipeline appears to be the superior solution for determining UMI duplicate networks!

VPetukhov · 2020-07-14T18:52:52Z

Would it be feasible to support output of a deduplicated BAM as well?

Unfortunately, it doesn't fit the workflow. Merging duplicated UMIs requires a lot of R calls, but all BAM-related functionality is in C++. So, basically, the simplest solution would be to run correction of UMIs in R, save the list of CB+Gene+UMI+CorrectedUMI to some file, and then have a C++ script that parses this file and outputs the corrected one.
To my experience, writing such a C++ script is generally faster than waiting for Python to do the same :) You basically need to take the BamTools library, iterate over the bam, update the tags and save it to another bam. Something like ~50 lines of code. Here is an example of iteration over bam, and here is another one for editing tags.

I am not interested in scRNA-seq counts but rather the ability of your pipeline to identify and deduplicate erroneous UMIs for other applications.

Do you mean "deduplicate erroneous scRNA-seq UMIs", or is it about some completely different kind of data? The approach should work whenever you have cells, genes and UMIs. But maybe it can also be adopted to other cases.

evanbiederstedt added the enhancement label Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feasibility of including deduplicated alignments #109

Feasibility of including deduplicated alignments #109

ijhoskins commented Jul 9, 2020

VPetukhov commented Jul 14, 2020 •

edited

Loading

Feasibility of including deduplicated alignments #109

Feasibility of including deduplicated alignments #109

Comments

ijhoskins commented Jul 9, 2020

VPetukhov commented Jul 14, 2020 • edited Loading

VPetukhov commented Jul 14, 2020 •

edited

Loading