Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata fbc3 #1237

Open
wants to merge 122 commits into
base: devel
Choose a base branch
from
Open

Metadata fbc3 #1237

wants to merge 122 commits into from

Conversation

akaviaLab
Copy link
Contributor

Updated version of #988. Tried to simplify and tidy the code in addition to merging. Modified according to @cdiener requests

  • Ignored changes in notes for now
  • renamed constructors a bit
  • added eq comparisons that build classes if necessary

Original description of #988 follows

Metadata Classes: A separate directory for handling metadata information is made inside cobra/core directory. Every object derived from SBase can have meta information like annotation (CVTerms), notes, history attached to it.
The CVTerms class for storing externally linked resources to each component derived from SBase. This class is maintaining the new format annotation as well as old format annotation simultaneously, and both are kept synchronized with each other. Changing one will modify the other accordingly. This new class for annotation can handle any type of annotation data (be it the case of nested annotation or alternative annotation). It can read the old format as well as the new format annotation data from JSON and other formats. At the time of writing back the model, the new data format is used because it contains the complete data organized in the same way as SBML.
The History class used for storing the history, validating dates, etc, is now attached to each component derived from SBase.
The KeyValuePair class for storing key-value pairs, defined by fbc-v3.
The last three metadata objects (i.e CVTerms, History, KeyValuePair) are present inside a single attribute of SBase (Object) class and can be accessed via object.annotation.cvterms, object.annotation.history and object.annotation.key_value_data attributes. Calling simply the annotation attribute (object.annotation) will return the annotation data in old format (making it backward compatible).

Group to JSON: The support of the group package is extended to JSON.

JSON schema v2: The version2 of JSON schema has been added which defines the new format annotation, history, key-value pair, notes, group package data, user-defined constraints data and basic SBML info.
@akaviaLab comment - not sure if the v2 schema works or is done. Seems unclear from the code.

Issues Fixed
fixes issue #954: The support of group package has been extended to JSON and other formats.
fixes issue #856: The infinity values are also enabled for storing bounds.
fixes issue #810: The sbml_info storing basic information of SBML is written to JSON to store the basic SBML document information like packages, level, version, notes, annotation attached to the SBML component etc.
fixes issue #736: If annotation is present in the form of list of list, it is first modified and then data is read.
fixes issue #706: Single annotation resource are now put into a string. While reading from JSON also, if a single resource attribute is in the form of string, it is first fixed to list and then read into the model.
fixes issue #684: The complete metadata structure (CVTerms, Notes, History) has been redesigned with backward compatibility. Old format data is read, fixed and then written in new format metadata structures. The fbc-v3 "KeyValuePair" class is also a part of this new format annotation. Its corresponding class and its support to JSON and other format has been added. However, the SBML parser for it has to be updated when libsbml adds support for fbc-v3.
fixes issue #937: The annotation format has been updated, which is backward compatible too.
JSON schema v1: The JSON schema version 1 is modified to resolve some pre-existing issues.

Tests
Tests for all the newly implemented features are added to check the functionalities. A few old tests are also modified accordingly. Some tests which were initially marked 'xfail' are now working dew to modified formats.

@akaviaLab
Copy link
Contributor Author

Okay. I'll keep that in mind for next issues. Is this what you meant with the creator for a simpler API?

@cdiener
Copy link
Member

cdiener commented Jul 8, 2022

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jul 12, 2022

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

History accepts str or datetime as a format for the dates.
parse_datetime parses string, and deals with datetime appropriately
So that seems to do what you described already, unless I misunderstood

Creation time now() by default - I can do that.
Creator in config - reasonable, but next PR.

uri.akavia added 2 commits July 12, 2022 13:16
sbml document history is saved
added relevant tests
@matthiaskoenig
Copy link
Contributor

Hi all,
I could review/update the code.

For clarification with the key/value store. This is basically the new fbc-version3 package which just needs a second implementation (which is this implementation) to be officially accepted. I.e. this is basically official as soon as we merge this in develop. So the key-value pair has to stay in.

A lot of discussion here. Are there any open things which have to be decided or is everything down to implementation details at this point?
Best Matthias

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jul 13, 2022 via email

@matthiaskoenig
Copy link
Contributor

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

Please don't set any current times! This will create diffs on your SBML models and model files every time you run your scripts resulting in large git diffs! I.e. don't set a current time as default. Perhaps an option for that, but please not the default behavior. The HistoryElement can be set on all core objects, so having diffs on all objects every time a script is executed is a bad idea.

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jul 13, 2022 via email

@akaviaLab
Copy link
Contributor Author

akaviaLab commented Jul 13, 2022 via email

@cdiener
Copy link
Member

cdiener commented Jul 13, 2022

@matthiaskoenig

Key-value pairs will be in there. They are super useful and a regularly requested feature.

The current date was only supposed to use current time as default value for the constructor. So it would only be set the first time the history is created and only if the user does not specify a date. EDIT: see reply below. Only set on writin gif no date is set.

The other discussion was whether we should limit the format of IDs for the v2 JSON schema. I don't think there is a good solution here, since the SBML and COBRAPY ID requirements are not compatible. So it will come down to whether we want the JSON schema 100% SBML compliant, with the limitation that not all COBRAPY models can be saved to that format.

The rest is just naming of classes etc. to make it a bit more accessible to devs and users without dep knowledge of SBML abstractions.

@akaviaLab
Copy link
Contributor Author

The current date was only supposed to use current time as default value for the constructor. So it would only be set the first time the history is created and only if the user does not specify a date. So only when explicitly doing model.annotation.history = History(...) the first time and when not providing the date explicitly in the constructor. We could skip this, but I suspect many users will do something like model.annotation.history = History(created_date=datetime.now(), ...). Right now, it will initialize to None if not specified. If you prefer that, we can leave it as is...

Right now, History() is called as part of every annotation, so modifying any default there would lead to every entity having a created date. That's a bad idea.
sbml.py has a small section when writing a document - if that document (not the model, but the document) has no creation date, it will be set to current time.

elif isinstance(sbase, libsbml.SBMLDocument): time = datetime.datetime.now() timestr = time.strftime("%Y-%m-%dT%H:%M:%S%z") date = libsbml.Date(timestr) _check(comp_history.setCreatedDate(date), "set creation date for document")

That's can be removed or stay.

@matthiaskoenig
Copy link
Contributor

matthiaskoenig commented Jul 13, 2022

I would highly prefer to have SBML SIds as valid cobrapy ids.

  • These can be serialized to all formats
  • These work in optlang and the solver interfaces
  • These are valid object identifiers (which is sometimes important)
  • These are very simple to validate with the following regex [a-zA-Z][a-zA-Z0-9_]*
  • Most models already use valid SIds because SBML is the de facto exchange standard

The only exceptions are some databases such as BiGG (no longer maintained, which also provides SBML with valid IDs).

These would allow:

  • to get rid of all code for id replacements besides on the importers for JSON/Matlab
  • to have no id replacements on SBML and valid cobrapy models (these create a ton of issues when mapping data/models, ...) and create a ton of bugs.

I strongly advocate for this solution and just enforce valid SIds and move id_replacement in helper functions which would be applied for import besides SBML (and as toolbox for people to fix there invalid models).

edit: fixed regex

@akaviaLab
Copy link
Contributor Author

@cdiener @matthiaskoenig
Is the documentation in af5f3fa for CVTerm and Qualifier better? What the documentation should be in general?

@matthiaskoenig
Copy link
Contributor

@cdiener @matthiaskoenig Is the documentation in af5f3fa for CVTerm and Qualifier better? What the documentation should be in general?

Yes, the documentation is better. But please use a string enum for such enums.
This will allow to parse strings much easier to the actual instances. I.e. something along the lines of

class MyEnum(str, Enum):
    state1 = 'state1'
    state2 = 'state2'

@cdiener
Copy link
Member

cdiener commented Jul 14, 2022

I would highly prefer to have SBML SIds as valid cobrapy ids.

* These can be serialized to all formats

* These work in optlang and the solver interfaces

* These are valid object identifiers (which is sometimes important)

* These are very simple to validate with the following regex `[a-zA-Z][a-zA-Z0-9_]*`

* Most models already use valid SIds because SBML is the de facto exchange standard

I think this is hard because of backwards compatibility as mentioned above, and it would be a bit inconsistent for instance when reading models with JSON v1 schema. But I so see your point and agree that all the SBML-like formats IDs should be read as is and there should be helpers to convert them to that format. Point 2 is not fully correct since there are some IDs that are SBML-compliant but don't work in optlang in some solver interfaces (IDs longer 255 chars in GLPK, or metabolites named "st" in Gurobi for instance). So basically we push the defaults toward SBML-compliant IDs without breaking other formats such as Matlab etc.

uri.akavia added 8 commits July 19, 2022 20:23
…ist - need examples (TODO)

Qualifiers as string enum
CVTermList will output list of dicts, not a dict
update_pickles.py
minor renaming
changed sbml.py according to previous changes. cleaned up processing of history in sbml.py
changed test_metadata.py according to these changes
improve equality of History object
some tidying of metadata.py and adding __eq__ function
added test_old_style_annotation to test_metadata.py
updated some data and tests
Some minor modifications and comments on sbml and test_sbml.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue or pull request lacks activity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants