Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide optional strand-specificity flags for kallisto bus #265

Open
wants to merge 4 commits into
base: old_aa_structure
Choose a base branch
from

Conversation

mfansler
Copy link

@mfansler mfansler commented Jun 1, 2020

Purpose

This pull request extends kallisto bus to have optional functionality to associate reads to ECs based on transcript strand. This is largely achieved with the code already existing for handling strandedness in kallisto quant. This pull request leaves the default behavior unchanged, while adding an option to use --fr-stranded or --rf-stranded flags in the kallisto bus command.

Flag Orientation

Since the actual sequence read may be on the first or second read depending on the technology, I decided to code it such that --fr-stranded means the sequence read is sense with the transcript and --rf-stranded means anti-sense. As an example, all 10X Chromium 3'-end versions would use --fr-stranded because the sequence read is sense with the transcript, despite being located in R2.

Future Direction

Perhaps a more practical interface would be to provide only a single flag (e.g., --stranded) and have the direction encoded with the technology info (e.g., like the BC/UMI location info).

Misc

As a technical note, the deletion of lines 1490-91 in ProcessReads.cpp is counterbalanced by uncommenting the findEC step later in the code (1509/1537). As far as I can tell, it was unnecessary to have the earlier one run, but it was essential to compute it after removing strand ambiguities.

@mfansler mfansler changed the base branch from master to devel June 1, 2020 01:58
@mfansler
Copy link
Author

The latest commit adds functionality to the BAM reader so that sequences aligned in reverse complement are transformed back to their (presumably) original sequences. This is essential for performing a strand-specific pseudoalignment when starting from a mapped BAM, otherwise some reads come in with the wrong orientation.

Unlike the previous changes, which were implemented as new non-default options, this change is implemented as a new default for the BAM reader. I believe it is the ideal behavior for loading BAMs and the computational overhead it adds is checking the BAM_FREVERSE flag value for each read. Please let me know if there are other considerations for why it should not be the default.

@mfansler
Copy link
Author

mfansler commented Mar 4, 2021

Any feedback on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant