Supported opposite-strand protocols (for metagene) #23

sarahhcarl · 2018-03-13T15:29:31Z

Hi Plastid team,

I have found a lot of the functionality provided by plastid very useful, but I got frustratingly stuck today when I was trying to run metagene count on data originating from a stranded protocol that produces reads on the opposite of the original strand. I know in your FAQ you suggest reverse-complementing the reads and re-aligning them, but honestly I feel like this is a lot more work than necessary, especially for a big project with many samples.

This issue was almost a deal-breaker for me, but then I realized I could get the same effect by simply reversing the strand of all of my window annotations in my ROI file. I did it with sed, but it would seem to be a pretty straightforward feature to add as a parameter in the actual command, either at the stage of generating the ROI or on the fly during counting. Such a feature would be a relatively easy way to expand usage of plastid by supporting a large number of users with this type of data.

Thanks!
Sarah

joshuagryphon · 2018-03-13T18:48:09Z

Hi @sarahhcarl ,

Thank you for your thoughts. I agree that re-aligning and reprocessing is more work than it is worth, especially for large datasets, and especially when dUTP-based and other reverse-strand sequencing methodologies are common.

I have written kind of a long post in response (below); the TLDR is, I have been thinking about how to do this for a while, and am taking suggestions on how best to support this. Please read on if you have thoughts!

Thanks & best,
Josh

There are a couple of reasons we haven't included a strand-flipping function yet — even though I actually started working on it two years ago (!), without finishing it — but the main theme is that we haven't figured out the best way to solve the problem, "best" meaning:

in the least annoying way possible from a user's point of view
in a way that is most consistent throughout the code base, so that we don't surprise ourselves by creating counterintuitive behaviors in existing or future functionality

If you have thoughts on how best to do this, I welcome your feedback. Here's what I have been considering:

As you note, this sort of flipping functionality could be accomplished by flipping the strands of (a) features (e.g. genes, as you did), (b) read alignments, or (c) enabling both, allowing the user to choose.
- Code-wise, flipping the features is the fastest, easiest, and most isolated
- However, flipping reads seems to be the most intuitive (at least to me), because as a biologist, I think of reads as deriving from a given mRNA or strand of DNA, and want the reads to be represented on that strand in e.g. a genome browser, or in other analyses.
If we go with the read flipping strategy, there are a bunch of places where code changes could be made to support that feature:
- in the mapping functions (what I'm leaning towards). This would be the most explicit, and, philosophically, is most in tune with Plastid's data model. But doing so would mess up functionality / require code duplication in Plastid's BAMGenomeArray, among other places
- inside the various classes of GenomeArray objects. This would result in fewer code changes, and cleaner code overall, but probably makes less sense to the user because flipping the strand is really in the purview of the mapping function.
- inside the script itself, like metagene. I'd rather not do this, because (a) the feature would have to be duplicated across all of the scripts, and (b) the feature wouldn't be usable inside interactive environments (e.g. Jupyter notebooks, which a lot of people use)
When implementing this feature, I want to solve another related problem (and a common ask), to support reads from legacy unstranded sequencing protocols. To minimize risk of having to make breaking changes later on, it makes sense to come up with a design that deals with both of these issues holistically, and I don't have a sense of what the best set of behaviors would be.

sarahhcarl · 2018-03-15T11:44:25Z

Hi Josh,

Thanks for the thoughtful reply. After thinking about it some more and reading the genome_array.py code (not exhaustively, so I may have missed something!), it seems to me like the most intuitive way to go would be to neither explicitly flip the strands of the reads nor the features, but rather to change the logic of read filtering. Maybe this is what you were getting at in point 2a ("in the mapping functions").

For example, it seems like one critical point in the get_reads_and_counts function is here:

 reads = itertools.chain.from_iterable((X.fetch(reference=chrom,
                                               start=start,
                                               end=end,
                                               # until_eof=True, # this could speed things up. need to test/investigate
                                               ) for X in self.bamfiles))
            
        # filter by strand
        if strand == "+":
            reads = ifilter(lambda x: x.is_reverse is False,reads)
        elif strand == "-":
            reads = ifilter(lambda x: x.is_reverse is True,reads)

What about making an additional mapping parameter that would allow this filtering step to either choose reads on the same strand, reads on the opposite strand, or all reads regardless of strand? I don't know how many functions would need this to be implemented (how much code would be duplicated), but to me it seems like a straightforward solution that also supports unstranded protocols.

Cheers,
Sarah

joshuagryphon · 2018-03-20T00:48:54Z

Hi Sarah,

Thank you for your thoughts! I have given this a little bit of work and have a working implementation. I moved the logic you highlighted above into the mapping functions (FivePrimeMapFactory(), and the like), which now take an extra parameter called flip_strand.

I just need to write a few more tests to make sure there are no hidden surprises, and make a few other changes to keep up with updates to pysam. I'll let you know when the changes are done and have been merged back into the develop branch.

Cheers,
Josh

joshuagryphon · 2018-06-13T00:59:12Z

Hi Sarah-
I just wanted to let you know that this hasn't fallen off the radar. I haven't had a chance to get back to my tests due to work pressures. If it's still useful, I can push the feature branch containing my implementation up here, with the caveat that until I'm done building the test datasets & rewriting the unit tests, the implementation won't be proven

sarahhcarl · 2018-06-21T15:01:25Z

Hi Josh, At the moment I don't need it urgently, so I don't think you need to rush - although maybe other users would find it helpful. Thanks for touching base again though! Cheers, Sarah

…

On Wed, Jun 13, 2018 at 2:59 AM Joshua Griffin Dunn < ***@***.***> wrote: Hi Sarah- I just wanted to let you know that this hasn't fallen off the radar. I haven't had a chance to get back to my tests due to work pressures. If it's still useful, I can push the feature branch containing my implementation up here, with the caveat that until I'm done building the test datasets & rewriting the unit tests, the implementation won't be proven — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#23 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADhDIgN90dwnTJTt0PVP0yR83kQs5Ivjks5t8GPhgaJpZM4So5nV> .

joshuagryphon added the enhancement / feature request label Mar 13, 2018

joshuagryphon changed the title ~~Supported opposite-strand protocols for metagene~~ Supported opposite-strand protocols (for metagene) Mar 13, 2018

joshuagryphon added this to the Plastid 0.5.0 milestone Nov 29, 2019

joshuagryphon modified the milestones: Plastid 0.5.0, Plastid 0.6.0 May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supported opposite-strand protocols (for metagene) #23

Supported opposite-strand protocols (for metagene) #23

sarahhcarl commented Mar 13, 2018

joshuagryphon commented Mar 13, 2018

sarahhcarl commented Mar 15, 2018 •

edited

Loading

joshuagryphon commented Mar 20, 2018

joshuagryphon commented Jun 13, 2018

sarahhcarl commented Jun 21, 2018 via email

Supported opposite-strand protocols (for metagene) #23

Supported opposite-strand protocols (for metagene) #23

Comments

sarahhcarl commented Mar 13, 2018

joshuagryphon commented Mar 13, 2018

sarahhcarl commented Mar 15, 2018 • edited Loading

joshuagryphon commented Mar 20, 2018

joshuagryphon commented Jun 13, 2018

sarahhcarl commented Jun 21, 2018 via email

sarahhcarl commented Mar 15, 2018 •

edited

Loading