Provide optional strand-specificity flags for kallisto bus
#265
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This pull request extends
kallisto bus
to have optional functionality to associate reads to ECs based on transcript strand. This is largely achieved with the code already existing for handling strandedness inkallisto quant
. This pull request leaves the default behavior unchanged, while adding an option to use--fr-stranded
or--rf-stranded
flags in thekallisto bus
command.Flag Orientation
Since the actual sequence read may be on the first or second read depending on the technology, I decided to code it such that
--fr-stranded
means the sequence read is sense with the transcript and--rf-stranded
means anti-sense. As an example, all 10X Chromium 3'-end versions would use--fr-stranded
because the sequence read is sense with the transcript, despite being located in R2.Future Direction
Perhaps a more practical interface would be to provide only a single flag (e.g.,
--stranded
) and have the direction encoded with the technology info (e.g., like the BC/UMI location info).Misc
As a technical note, the deletion of lines 1490-91 in
ProcessReads.cpp
is counterbalanced by uncommenting thefindEC
step later in the code (1509/1537). As far as I can tell, it was unnecessary to have the earlier one run, but it was essential to compute it after removing strand ambiguities.