-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to store and represent and compare non collated single value attributes in a sequence collection #57
Comments
I like the simplicity that requiring an array at level 2 provides. And also the simplicity that the digest algorithm is always used to go from level 2 to level 1. If for instance the value at level 2 is a string, this could still be a novel-length string, while the digest at level 1 will always be digest-size. However, one solution to your issue might be to add an array property that specifies the level of expansion, so that you can stop expanding your |
We can definitely discuss this. I can share how I implemented it in my demo implementation, in case it's useful:
So in your case, I would have used a vanilla string for "single-value-attribute" -- and I would not have digested this, so there would be no recursion to retrieve the value. The way I did this was actually using sveinung's second insight: I have a property in the schema that indicates which attributes are digested. This is basically explained in my henge tutorial. While I can see some rationale in the simplicity of just saying "everything is an array, and everything gets digested" -- I think this is unnecessary overhead for many single-value attributes. |
Perhaps this should also be in the standard? |
I personally think this is what makes the most sense. But this would raise the issue of how to declare such attribute in the service info. lengths:
type: array
collated: true
description: "Number of elements, such as nucleotides or amino acids, in each sequence."
items:
type: integer
single-value-attribute:
type: string
collated: false
description: ""
|
At one point I was using the keyword lengths:
type: array
collated: true
digested: true # maybe left off as default?
description: "Number of elements, such as nucleotides or amino acids, in each sequence."
items:
type: integer
single-value-attribute:
type: string
collated: false
digested: false # could be specified here, when necessary?
description: "" For comparison: I propose we make these changes to the comparison function result:
Then, we may want to introduce new functionality to the comparison result, such as |
My notes from Oct 18th say what we came to was: Changes to be made to comparison function:
This makes the comparison function terminology more in-line with collections that include single-value attributes. The only other thing to decide is: how do you know what to digest and what to not digest? I guess there are two possibilities:
Would the latter work? It would prevent you from digesting a singleton. Is this just an implementation detail or must this be part of the spec? |
Our decision was to adopt option 3 with one change: add |
@tcezard is this issue solved, to the point that this can be closed? |
I would like to store metadata attribute that only have single values in a sequence collection.
How do we see them being represented in the JSON at level 1 and level 2.
Represent them similarly to other attributes
At level 2, are we storing them in a single value array anyway?
or directly plain text ?
Then at level 1 they would be digested similarly to the other attributes?
Represent them as single value in every level
Alternatively they could be expose in plain text directly at level 1 and level 2 with not changes
Level 2
Then level 1
Comparison
The comparison result seems highly dependent on the representation at level2. if we chose the level 2 representation in a single value array then the comparison can be done in the same way as with the other attributes.
Other representation might require different infrastructure.
Use cases
There are many use case for single value attributes like
assembly-accession
ornaming-authority
But the one use case I have in mind is to store
sorted-sequences
as a single level1 digest.Since I won't need the detail of sequences already stored in the
sequences
attribute I can relatively cheaply have a order relaxed comparison on any attribute by comparing the the level1 digest and not store the underlying array.The text was updated successfully, but these errors were encountered: