Metadata fbc3 #1237

akaviaLab · 2022-06-19T16:34:30Z

Updated version of #988. Tried to simplify and tidy the code in addition to merging. Modified according to @cdiener requests

Ignored changes in notes for now
renamed constructors a bit
added eq comparisons that build classes if necessary

Original description of #988 follows

Metadata Classes: A separate directory for handling metadata information is made inside cobra/core directory. Every object derived from SBase can have meta information like annotation (CVTerms), notes, history attached to it.
The CVTerms class for storing externally linked resources to each component derived from SBase. This class is maintaining the new format annotation as well as old format annotation simultaneously, and both are kept synchronized with each other. Changing one will modify the other accordingly. This new class for annotation can handle any type of annotation data (be it the case of nested annotation or alternative annotation). It can read the old format as well as the new format annotation data from JSON and other formats. At the time of writing back the model, the new data format is used because it contains the complete data organized in the same way as SBML.
The History class used for storing the history, validating dates, etc, is now attached to each component derived from SBase.
The KeyValuePair class for storing key-value pairs, defined by fbc-v3.
The last three metadata objects (i.e CVTerms, History, KeyValuePair) are present inside a single attribute of SBase (Object) class and can be accessed via object.annotation.cvterms, object.annotation.history and object.annotation.key_value_data attributes. Calling simply the annotation attribute (object.annotation) will return the annotation data in old format (making it backward compatible).

Group to JSON: The support of the group package is extended to JSON.

JSON schema v2: The version2 of JSON schema has been added which defines the new format annotation, history, key-value pair, notes, group package data, user-defined constraints data and basic SBML info.
@akaviaLab comment - not sure if the v2 schema works or is done. Seems unclear from the code.

Issues Fixed
fixes issue #954: The support of group package has been extended to JSON and other formats.
fixes issue #856: The infinity values are also enabled for storing bounds.
fixes issue #810: The sbml_info storing basic information of SBML is written to JSON to store the basic SBML document information like packages, level, version, notes, annotation attached to the SBML component etc.
fixes issue #736: If annotation is present in the form of list of list, it is first modified and then data is read.
fixes issue #706: Single annotation resource are now put into a string. While reading from JSON also, if a single resource attribute is in the form of string, it is first fixed to list and then read into the model.
fixes issue #684: The complete metadata structure (CVTerms, Notes, History) has been redesigned with backward compatibility. Old format data is read, fixed and then written in new format metadata structures. The fbc-v3 "KeyValuePair" class is also a part of this new format annotation. Its corresponding class and its support to JSON and other format has been added. However, the SBML parser for it has to be updated when libsbml adds support for fbc-v3.
fixes issue #937: The annotation format has been updated, which is backward compatible too.
JSON schema v1: The JSON schema version 1 is modified to resolve some pre-existing issues.

Tests
Tests for all the newly implemented features are added to check the functionalities. A few old tests are also modified accordingly. Some tests which were initially marked 'xfail' are now working dew to modified formats.

add an entry to the next release

akaviaLab · 2022-07-08T14:52:57Z

Okay. I'll keep that in mind for next issues. Is this what you meant with the creator for a simpler API?

cdiener · 2022-07-08T17:33:34Z

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

akaviaLab · 2022-07-12T01:28:59Z

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

History accepts str or datetime as a format for the dates.
parse_datetime parses string, and deals with datetime appropriately
So that seems to do what you described already, unless I misunderstood

Creation time now() by default - I can do that.
Creator in config - reasonable, but next PR.

sbml document history is saved added relevant tests

matthiaskoenig · 2022-07-13T08:22:09Z

Hi all,
I could review/update the code.

For clarification with the key/value store. This is basically the new fbc-version3 package which just needs a second implementation (which is this implementation) to be officially accepted. I.e. this is basically official as soon as we merge this in develop. So the key-value pair has to stay in.

A lot of discussion here. Are there any open things which have to be decided or is everything down to implementation details at this point?
Best Matthias

akaviaLab · 2022-07-13T11:10:21Z

Hi Matthias, As far as I remember, there are still some open discussions 1) Should we rename CVTerm to something like standardized? 2) Should creators have only one name, or stay with family name and given name? Should one name be an option? #1237 (comment) 3) Renaming the qualifiers, see #1237 (comment). I think they should stay as is. 4) What should the JSON schema include? See #1237 (comment) and #1240 (reply in thread) The rest is just tidying, code implementation. I'm still working on it, but if you have comments now that would be helpful. Thank you, Uri David

…

On Wed, Jul 13, 2022 at 4:22 AM Matthias König ***@***.***> wrote: Hi all, I could review/update the code. For clarification with the key/value store. This is basically the new fbc-version3 package which just needs a second implementation (which is this implementation) to be officially accepted. I.e. this is basically official as soon as we merge this in develop. So the key-value pair has to stay in. A lot of discussion here. Are there any open things which have to be decided or is everything down to implementation details at this point? Best Matthias — Reply to this email directly, view it on GitHub <#1237 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQYYZRBY2TPYODYFZSF4NLVTZ4DZANCNFSM5ZGT4REA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

matthiaskoenig · 2022-07-13T15:53:51Z

Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default.

Please don't set any current times! This will create diffs on your SBML models and model files every time you run your scripts resulting in large git diffs! I.e. don't set a current time as default. Perhaps an option for that, but please not the default behavior. The HistoryElement can be set on all core objects, so having diffs on all objects every time a script is executed is a bad idea.

akaviaLab · 2022-07-13T16:18:28Z

Right now, the document (libsbml.SBMLDocument), is set to be created at current time if there is no document creation time. Nothing else is created, because having diffs is a bad idea. No entities, and not the libsbml.Model entity either. I can remove this, but this doesn't seem harmful to me.

…

On Wed, Jul 13, 2022 at 11:54 AM Matthias König ***@***.***> wrote: Yep, pretty much. I would probably allow the dates to be provided directly as string and have init convert it to datetime. And maybe have some things initialized to default dates. For instance, the creation date could just be the current time by default. Might also be helpful if the creator info can be provided in the configuration and is then used by default. Please don't set any current times! This will create diffs on your SBML models and model files every time you run your scripts resulting in large git diffs! I.e. don't set a current time as default. Perhaps an option for that, but please not the default behavior. The HistoryElement can be set on all core objects, so having diffs on all objects every time a script is executed is a bad idea. — Reply to this email directly, view it on GitHub <#1237 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQYYZUZKCSJJCE646NPQYDVT3RBVANCNFSM5ZGT4REA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

akaviaLab · 2022-07-13T16:21:03Z

Okay. I think if this is what the community wants to do, we should define a standard of - What is a valid cobray ID? - How to deal with weird characters like :, that are currently converted to ASCII code. We can decide to have something like SBML_COLON, SBML_SEMICOLON (it might exist, I'm not sure) And get it done **before** cobrapy 1.0 @cdiener @Midnighter @synchon - any comments?

…

On Wed, Jul 13, 2022 at 11:48 AM Matthias König ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/cobra/io/schema_v2.json <#1237 (comment)>: > + } + }, + "required": [ + "key" + ], + "additionalProperties": false + } + } + } + } + }, + "type": "object", + "properties": { + "id": { + "type": "string", + "pattern": "^[a-zA-Z|_][a-zA-Z0-9_]*$" Yes, SIds are valid SBML ids. This can be checked very easily with a regex. Like mentioned above the only id changes are on - JSON import (IDs are fixed to be valid SIds) - MAT import (IDS are fixed to be valid SIds) - OTHER_FORMAT_NOT_SBML (IDS are fixed to be valid SIds) There would be a single function fixing ids, e.g. fix_invalid_ids used in both/all importers to have a consistent behavior across formats. Nothing would be replaced fixed on any exports, because SIds are valid JSON and MAT Ids. — Reply to this email directly, view it on GitHub <#1237 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACQYYZWNYJZ5L32SFOKOXDTVT3QONANCNFSM5ZGT4REA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

cdiener · 2022-07-13T19:04:38Z

@matthiaskoenig

Key-value pairs will be in there. They are super useful and a regularly requested feature.

The current date was only supposed to use current time as default value for the constructor. So it would only be set the first time the history is created and only if the user does not specify a date. EDIT: see reply below. Only set on writin gif no date is set.

The other discussion was whether we should limit the format of IDs for the v2 JSON schema. I don't think there is a good solution here, since the SBML and COBRAPY ID requirements are not compatible. So it will come down to whether we want the JSON schema 100% SBML compliant, with the limitation that not all COBRAPY models can be saved to that format.

The rest is just naming of classes etc. to make it a bit more accessible to devs and users without dep knowledge of SBML abstractions.

akaviaLab · 2022-07-13T20:00:14Z

The current date was only supposed to use current time as default value for the constructor. So it would only be set the first time the history is created and only if the user does not specify a date. So only when explicitly doing model.annotation.history = History(...) the first time and when not providing the date explicitly in the constructor. We could skip this, but I suspect many users will do something like model.annotation.history = History(created_date=datetime.now(), ...). Right now, it will initialize to None if not specified. If you prefer that, we can leave it as is...

Right now, History() is called as part of every annotation, so modifying any default there would lead to every entity having a created date. That's a bad idea.
sbml.py has a small section when writing a document - if that document (not the model, but the document) has no creation date, it will be set to current time.

elif isinstance(sbase, libsbml.SBMLDocument): time = datetime.datetime.now() timestr = time.strftime("%Y-%m-%dT%H:%M:%S%z") date = libsbml.Date(timestr) _check(comp_history.setCreatedDate(date), "set creation date for document")

That's can be removed or stay.

matthiaskoenig · 2022-07-13T21:24:57Z

I would highly prefer to have SBML SIds as valid cobrapy ids.

These can be serialized to all formats
These work in optlang and the solver interfaces
These are valid object identifiers (which is sometimes important)
These are very simple to validate with the following regex [a-zA-Z][a-zA-Z0-9_]*
Most models already use valid SIds because SBML is the de facto exchange standard

The only exceptions are some databases such as BiGG (no longer maintained, which also provides SBML with valid IDs).

These would allow:

to get rid of all code for id replacements besides on the importers for JSON/Matlab
to have no id replacements on SBML and valid cobrapy models (these create a ton of issues when mapping data/models, ...) and create a ton of bugs.

I strongly advocate for this solution and just enforce valid SIds and move id_replacement in helper functions which would be applied for import besides SBML (and as toolbox for people to fix there invalid models).

edit: fixed regex

akaviaLab · 2022-07-14T01:01:47Z

@cdiener @matthiaskoenig
Is the documentation in af5f3fa for CVTerm and Qualifier better? What the documentation should be in general?

matthiaskoenig · 2022-07-14T07:27:21Z

@cdiener @matthiaskoenig Is the documentation in af5f3fa for CVTerm and Qualifier better? What the documentation should be in general?

Yes, the documentation is better. But please use a string enum for such enums.
This will allow to parse strings much easier to the actual instances. I.e. something along the lines of

class MyEnum(str, Enum):
    state1 = 'state1'
    state2 = 'state2'

cdiener · 2022-07-14T19:32:37Z

I would highly prefer to have SBML SIds as valid cobrapy ids.

* These can be serialized to all formats

* These work in optlang and the solver interfaces

* These are valid object identifiers (which is sometimes important)

* These are very simple to validate with the following regex `[a-zA-Z][a-zA-Z0-9_]*`

* Most models already use valid SIds because SBML is the de facto exchange standard

I think this is hard because of backwards compatibility as mentioned above, and it would be a bit inconsistent for instance when reading models with JSON v1 schema. But I so see your point and agree that all the SBML-like formats IDs should be read as is and there should be helpers to convert them to that format. Point 2 is not fully correct since there are some IDs that are SBML-compliant but don't work in optlang in some solver interfaces (IDs longer 255 chars in GLPK, or metabolites named "st" in Gurobi for instance). So basically we push the defaults toward SBML-compliant IDs without breaking other formats such as Matlab etc.

…ist - need examples (TODO) Qualifiers as string enum CVTermList will output list of dicts, not a dict update_pickles.py minor renaming changed sbml.py according to previous changes. cleaned up processing of history in sbml.py changed test_metadata.py according to these changes

improve equality of History object some tidying of metadata.py and adding __eq__ function added test_old_style_annotation to test_metadata.py

updated some data and tests

Some minor modifications and comments on sbml and test_sbml.py

Hemant27031999 and others added 30 commits June 4, 2020 22:14

added annotation class

782b645

made some modifications

9709974

made the complete metadata package

a7af772

modified meta_data classes

a191413

modified classes to look exactly like dictionaries and lists

4ea5b3e

modified imports and incorporated SBO term in metadata

7795ea6

made the classed to inherit from MutableSequence and MutableMapping

02345bb

adding backward compatibility

4c6c1ca

added instance

ff72c55

work on annotation structure

989ffe4

adding json example

a4eb840

fixed instance2

cf38e02

work on cvterms

7d490a4

made backward compatible

024c980

code review metadata

ff440fb

cleaned the metadata class by putting code in respective classes

e4ad484

new annotation format supported for SBML to cobra model

2508ac3

added io for json and other formats

d5fc54c

added tests for new annotation format

c9e904b

updated history

a6e2f8a

fixing broken tests

23f9699

commented a few methods

2f55bbf

added equal method inisde metadata classes

5aae1b0

modified directories paths

d467a14

fixed imports and tox tests

e4cc72b

solved the problem of annotation copy

32a81da

new notes format

a07ccaf

code refactoring

f69a55c

modified history, keyvaluepair and notes

e4ab40b

modified tests and imports

8dc96cf

trying to fix 3.6 bug again

8248a8d

uri.akavia added 2 commits July 12, 2022 13:16

some tidying of sbml.py

85bd021

one query for CVTermList as requested

abcafcd

sbml document history is saved added relevant tests

better documentation

af5f3fa

uri.akavia added 8 commits July 19, 2022 20:23

some clarification and fixes of cvterm.py

bc6be3e

improve equality of History object some tidying of metadata.py and adding __eq__ function added test_old_style_annotation to test_metadata.py

reorganized metadata.py

10dd03c

fixed some changes in tests and example JSON due to changes in code

7c79e1c

finished documentation of metadata.py

8ad6e3d

moved annotation_xml to conftest.py

f4ffb8f

updated some data and tests

fix unification of CVTerms done by sbml

0564ffb

Some minor modifications and comments on sbml and test_sbml.py

simplified custompairs.py

988148d

This was referenced Nov 4, 2022

Store SBML meta information (level, version, packages, provenance) #810

Closed

Indication of model authors #907

Closed

Discussion for Group feature in COBRApy #954

Closed

cdiener added the stale The issue or pull request lacks activity. label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata fbc3 #1237

Metadata fbc3 #1237

akaviaLab commented Jun 19, 2022

akaviaLab commented Jul 8, 2022

cdiener commented Jul 8, 2022

akaviaLab commented Jul 12, 2022 •

edited

Loading

matthiaskoenig commented Jul 13, 2022

akaviaLab commented Jul 13, 2022 via email

matthiaskoenig commented Jul 13, 2022

akaviaLab commented Jul 13, 2022 via email

akaviaLab commented Jul 13, 2022 via email

cdiener commented Jul 13, 2022 •

edited

Loading

akaviaLab commented Jul 13, 2022

matthiaskoenig commented Jul 13, 2022 •

edited

Loading

akaviaLab commented Jul 14, 2022

matthiaskoenig commented Jul 14, 2022

cdiener commented Jul 14, 2022

Metadata fbc3 #1237

Are you sure you want to change the base?

Metadata fbc3 #1237

Conversation

akaviaLab commented Jun 19, 2022

akaviaLab commented Jul 8, 2022

cdiener commented Jul 8, 2022

akaviaLab commented Jul 12, 2022 • edited Loading

matthiaskoenig commented Jul 13, 2022

akaviaLab commented Jul 13, 2022 via email

matthiaskoenig commented Jul 13, 2022

akaviaLab commented Jul 13, 2022 via email

akaviaLab commented Jul 13, 2022 via email

cdiener commented Jul 13, 2022 • edited Loading

akaviaLab commented Jul 13, 2022

matthiaskoenig commented Jul 13, 2022 • edited Loading

akaviaLab commented Jul 14, 2022

matthiaskoenig commented Jul 14, 2022

cdiener commented Jul 14, 2022

akaviaLab commented Jul 12, 2022 •

edited

Loading

cdiener commented Jul 13, 2022 •

edited

Loading

matthiaskoenig commented Jul 13, 2022 •

edited

Loading