-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encourage the use of dedicated fields rather than semantic overloading #27
Comments
Very good points, @draeger. To be sure that I understand correctly, are you suggesting that metabolite identifiers in a model should not contain any compartment information? If so, I suspect this would create problems for software/formats that expect or require that the metabolite identifiers be unique. Or are you suggesting that the metabolite identifiers, which may contain a compartment abbreviation, should not be used to extract such information? |
I think it's a good idea to propose a change like this for a forward looking recommendation like standard-GEM. Even though it will make it even harder for tools to read since there will be thousands of "old-style" models and a few new ones 😝 |
@JonathanRob, an ID can, of course, contain any prefix, infix, or suffix that is commonly allowable. If it is an abbreviation for a compartment, that is fine. However, a software should not rely on a specific pattern for an identifier. If a metabolite has the ID |
Ok, yes, I absolutely agree. |
@draeger Very good point. Besides the case of compartment information in metabolite IDs, are there other cases that you are aware of? @Midnighter Reading should be fine, use of compartment field is part of SBML L3V1 + FBCv2 (maybe even earlier?) and at least COBRA toolbox, cobrapy and RAVEN can deal with this. Once loaded, however, COBRA still appends the compartment information to metabolite ID instead of having a dedicated compartment field, but there has been some progress on including a |
I think @matthiaskoenig would have some ideas on this topic, too. |
Not sure what to add here.
|
I think it's worth talking about how to best organize "display names" or "symbols" for model elements. I know that the interactivity offered by using most of the BiGG identifiers in cobrapy interactively is a major selling point for it. (Basically, being able to call methods on an element such as
|
We have three different things that people seem to like to represent with identifiers or names:
Now, the identifier could be an arbitrary
In any case, I'd recommend only keeping a very fundamental abbreviation and algorithmically appending any information, such as the compartment code, because if the model is changed, that information is likely to get out of sync. If only an abbreviation, such as As @matthiaskoenig points out, if a software internally stores multiple different kinds of information in one variable at once, this is a tool problem and should be solved by adding additional variables to that software (which is much easier to do than complicated and highly error-prone To answer @edkerk's question: There can be many different things encoded in some identifiers, such as compartment, tissue, reaction types, etc.. Here is an overview: https://github.com/SBRG/bigg_models/wiki/BiGG-Models-ID-Specification-and-Guidelines. |
@Midnighter Yes, it would be a selling point if it would work. With the current Bigg identifiers this does not work consistently (see opencobra/cobrapy#828). With valid SBML identifiers code completion just works, you just use SBML SIds are valid variable names! This is one of the big selling points for them. Storing the additional information is not too complicated. One can use the |
I strongly suggest that if standard-GEM recommends using key-value pairs to also recommend one specific key. I don't care what it's called but better to settle on one key early on. |
@Midnighter The complete key-value pair feature is already in the open pull request for cobrapy from GSOC (https://github.com/Hemant27031999/cobrapy/blob/metadata_fbc3_group/src/cobra/core/metadata/keyvaluepairs.py) |
Description of the issue:
Traditionally, identifiers have been used to encode various types of information, e.g., a metabolite's compartment. While this approach might lead to recognizable displays, it causes several problems in information storage and is an avoidable source of error and ongoing debates. It should be noted that display names and scientific names can alternatively be encoded in different ways.
Expected feature/value/output:
Working with models would become easier if identifiers could be treated as what they are: unique
String
keys to identify particular objects within a model. We should avoid expecting anything else from them. Using them as display-names or for any further semantic information often leads to misunderstanding and complicated parsing routines. Trying to validate their assumed structure can lead to incompatibilities to models from scientists outside of the COBRA community.Current feature/value/output:
Let's use only dedicated fields to store the semantics of a model. For instance, the compartmental localization of a metabolite can unambiguously be stored using the
species
'compartment
attribute, so that we don't need to rely on any suffix of an identifier. If a display name is needed as an abbreviation, it can, for instance, be stored using the new key-value pair structure in the upcoming FBC version 3 package. To avoid redundancy, one could, for example, keep an abbreviation for a metabolite (e.g.,glc
) and algorithmically append_c
,[c]
, or any other representation for the compartment depending on user preferences.Reproducing these results:
Many sources of error could be eliminated. Identifier parsing is cumbersome and error-prone. It should be avoided. It is complicated and compromises the reproducibility and interoperability of models.
The text was updated successfully, but these errors were encountered: