You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see that the dropEst program reports a matrix of counts for genes in the input GTF. Would it be feasible to support output of a deduplicated BAM as well? I am not interested in scRNA-seq counts but rather the ability of your pipeline to identify and deduplicate erroneous UMIs for other applications. I realize this may be out-of-scope but your pipeline appears to be the superior solution for determining UMI duplicate networks!
The text was updated successfully, but these errors were encountered:
Would it be feasible to support output of a deduplicated BAM as well?
Unfortunately, it doesn't fit the workflow. Merging duplicated UMIs requires a lot of R calls, but all BAM-related functionality is in C++. So, basically, the simplest solution would be to run correction of UMIs in R, save the list of CB+Gene+UMI+CorrectedUMI to some file, and then have a C++ script that parses this file and outputs the corrected one.
To my experience, writing such a C++ script is generally faster than waiting for Python to do the same :) You basically need to take the BamTools library, iterate over the bam, update the tags and save it to another bam. Something like ~50 lines of code. Here is an example of iteration over bam, and here is another one for editing tags.
I am not interested in scRNA-seq counts but rather the ability of your pipeline to identify and deduplicate erroneous UMIs for other applications.
Do you mean "deduplicate erroneous scRNA-seq UMIs", or is it about some completely different kind of data? The approach should work whenever you have cells, genes and UMIs. But maybe it can also be adopted to other cases.
I see that the dropEst program reports a matrix of counts for genes in the input GTF. Would it be feasible to support output of a deduplicated BAM as well? I am not interested in scRNA-seq counts but rather the ability of your pipeline to identify and deduplicate erroneous UMIs for other applications. I realize this may be out-of-scope but your pipeline appears to be the superior solution for determining UMI duplicate networks!
The text was updated successfully, but these errors were encountered: