-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we make BioSequences <: AbstractVector? #173
Comments
Linking the PR documenting the concerns with getting broadcasting to work with sequences, as some of the concerns there are relevant. #171 |
One potential issue with inheriting from AbstractVector is I don't believe we can simply do |
One alternative, is we could redefine the interface a bit. So as you have |
Let's say we did this, and take the example of IUPAC alphabets. Ok, we could have:
Well some traits and methods might change a bit, but not totally, e.g. |
I did see Maybe another way of conceptualising the I'm not keen on keeping track and supplying two types to I think this idea of placing information on the element would move your I like the idea of expressing the packed encodings as Besides nice syntax for assignments, does broadcasting across packed encoding offer any computational shortcuts, or is it still necessary to unpack to |
I'm not sure about putting the encoding information on the element. BitVector for example does not do that, the array user doesn't have to worry, ok this is a |
So thought about this a bit more today, and I'm more open to the But you got me thinking is it simpler to just have the symbols or elements dictate their own encoding and allowed values. Let's say you have DNA & RNA type symbols, which basically just allow A, T, C, G, their encoding can be 2 bits. And IUPAC_[DNA/RNA] types which are basically the 1-hot encoded DNA and RNA types that BioSymbols currently have. We define conversion and promotion between them so they work seamlessly for operations, comparison, and predicates. The BioSequence types then only need to know one thing - how many bits do these symbol types require? And all the sequence types do is pack or unpack them from their own internal data structure. Then people that want to extend or take advantage of the package only have to define the symbol type e.g. say a "Genotype" for sequences of 0/0, 0/1 for say SNP calls and so on. The author of the symbols defines the encoding with the symbol type itself, and we just have the sequence types pack-em and unpack-em. What do you think of this @jakobnissen? |
So I originally suggested this in Slack. Just like you, @benjward, it came to me after trying to implement broadcasting for
However, I think using subtyping to solve these kind of interface issues is ultimately a bad solution, generally. It's the one that Julia provides by default, but it's bad nonetheless. The issue is that:
Like you said, @benjward, I think the broader Julia community is beginning to move towards a consensus that one should use ducktyping and traits for interfaces where possible, instead of relying on subtyping. It's more important that It might suck right now, because broadcasting in particular is hard to implement for non-array types, but this is a problem with the broadcasting machinery, not with Finally, I also think that the fact that the element type of However, I agree that it's worth thinking about havign Right now, however, I think it's not worth it. The current system works pretty well. |
Ok, so I didn't go into the cons of this too much because I was still mulling over, but you have hit on the downsides I was considering, and a few more in addition. I looked deeply into broadcasting this week by I believe our sequences really are not that far away from the entire interface (they quack almost exactly like a vector when required!), and that making use of the standard As far as I can tell, if we simply decide "hey, if you are If we decide instead "Hey, if your broadcast involves biosequence arguments & the output is |
The conversation has moved in another direction, but let me offer another analogy for the It's like describing In terms of This is likely relevant to BioJulia/BioSymbols.jl#42. |
Gonna close this as I believe the answer to this question is "No" to the AbstractVector. |
So I'm just making an actual issue to discuss - and document said discussion of a prospect that was raised on Slack.
Because as I'm going through broadcasting, I'm seeing that there are some arguments in the pro column for doing this.
The text was updated successfully, but these errors were encountered: