You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two more removal of duplicates methods I would like.
--removeDuplicatesProb will remove by sequence but will store the MD5 sum of the sequences, not the sequences themselves. So it's only probabilistic. This helps to avoid running out of memory.
--removeDuplicatesByShortId de-duplicates based on the first part of the read id (up to the first space, if any). This is needed because if you combine output from (say) BLAST or DIAMOND with that from an aligner that produces SAM/BAM, the read ids won't match. That's because in a SAM/BAM file the reads have ids only up to the first space. So we need this option to be able to de-duplicate on combined reads from these different matchers.
The text was updated successfully, but these errors were encountered:
There are two more removal of duplicates methods I would like.
--removeDuplicatesProb
will remove by sequence but will store the MD5 sum of the sequences, not the sequences themselves. So it's only probabilistic. This helps to avoid running out of memory.--removeDuplicatesByShortId
de-duplicates based on the first part of the read id (up to the first space, if any). This is needed because if you combine output from (say) BLAST or DIAMOND with that from an aligner that produces SAM/BAM, the read ids won't match. That's because in a SAM/BAM file the reads have ids only up to the first space. So we need this option to be able to de-duplicate on combined reads from these different matchers.The text was updated successfully, but these errors were encountered: