-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mat cleanup #1220
Mat cleanup #1220
Conversation
changes to sbml.py to read annotations as list improved validate.py so it won't fail with new files modified annotation.xml, e_coli_core.xml to have more correct XML files and less errors compared to mat file modified true values in test_annotation.py and test_sbml.py
got sbml to add the SBO term for subsystem updated wrong_key_caps.mat file and the associated test
minor tidying of sbml.py, test_mat.py some tidying and flake8 correction of test_sbml.py
…to MATLAB Fixed some issues in mat.py Made SBML read groups into subsystem. Made SBML have group annotations as list
SBML will set undefined formula and model name as None
@cdiener - This is the updates we were discussing on gitter. |
Hi, thanks for the fixes. One comment before I start reviewing. For annotations, both the |
Yes, I changed it for that reason. |
Might become irrelevant if we update the annotations, but we can decide for now. |
@cdiener Ping. Can you please review this? |
I'm still somewhat reluctant to change the test models' annotation format here mostly because the idea is to have the test models manually vetted and than adapt the code base to read them well. Tests and/or the code base still has to be able to recognize that |
Okay. We can discuss this in detail, but there are several changes to the annotation files (which perhaps should be in a separate PR)
Are these the changes you meant? |
All of those cases are fine with me. I would just revert the conversions from single annotations to list. That has to be fixed in the tests that should accept the equivalence I outlined in the previous comment. Regarding the comparison it sounded like @Midnighter had some opinion on that so that's why I was waiting on more feedback from him. |
I'm a bit confused about the list for single annotations. |
Because by changing the test models to be more convenient for you, you don't change the fact that annotations are still allowed to be strings or lists for single elements. You just remove examples where they are strings from the test suite which then covers less data formats. We need to remain backwards compatible with old models in JSON or YAML format and therefore that case still has to remain supported and tested until we change it to a new annotation format. But like I said it also comes down to only modifying the test models if they are actually wrong and not if it's more convenient to write tests. |
To be clear my comments are only aimed at the JSON test models you are changing. In the mat parsing you are free to return them as you wish. Also why are you changing almost all other test models in this PR (like pickle and SBML ones as well). What do those have to do with the Matlab interface? |
There are a lot of changes, and things got mixed.
Let me specify them, and mark which ones make less sense to be in this PR
(although I think some of them are needed in another PR, and some make
sense to be in the same PR). Can you please comment on each one so that we
are on the same page?
c14b306
Changes in mat.py (this PR)
modify tests/data/e_coli_core.xml
- ncbigi to ncbiprotein, since ncbigi is not a valid identifiers.org prefix
(Other PR)
- removed excess spaces (shouldn't have done that, should fix code to
ignore them)
- changed invalide identifiers like http://identifiers.org/bigg.reaction/ac
to http://identifiers.org/bigg.reaction/EX_ac_e - ??. Maybe I shouldn't
have done this at all?
annotations.xml
- change invalid ncbgi to ncbiprotein (other PR)
textbook.xml.gz - basically same changes as e_coli_core.xml, will process
according to decision there
sbml.py
- read annotations into list (other PR or not at all)
I saw line 1805 in sbml.py which said # FIXME: always in list, and thought
it would be a good idea
validate.py - fix according to changes in sbml.py
test_annotation.py, test_sbml.py - changed according to sbml.py
update_pickles.py (necessary because of changes in xml files, but should be
moved to other PR)
a9d7f18
make sbml.py subsystem reading add partonomy, which matches the definition
of SBO:0000633 (see https://sourceforge.net/p/sbo/term-request/113/) -
other PR
update_pickles.py
changes in test_mat.py to match the reading of CHEBI (this PR)
a77f773
mat.py - some tidying (this PR)
sbml.py - unify two separate ifs (other PR)
test_sbml.py - unify implicitly concatenated strings (other PR). Doesn't
change behavior, just code
219360b
tests/data/example_notes.xml - remove excess compartments, since excess
compartments are read into SBML, but not MAT. Not sure if this change
should happen, but since I made example_notes.xml I thought it would be fine
src/cobra/io/sbml.py
- read group back into subsystem to keep read/write consistent, also since
mat doesn't understand groups, only subsystem (not sure where this should
be, if at all)
- modify SBO to be in list (like line 1805, above, will modify according to
that)
mat.py - some changes (this PR)
307ec41
mat.py - correctly read back subsystems into groups from mat (this PR)
sbml.py - set empty values to None instead of '' (other PR?). This makes
sense to me since C++/SBML and Python seem to disagree on what is empty.
SBML does '', while Python does None.
77a1359
update_pickles.py and black - needed, but need to decide on other changes
f6244f2
black on test_annotation.py - needed, but will decide on all other changes
One reason these are all the same PR is that some of them would require a
lot of code if they were separate.
For example, if SBML should read things in lists (according to the FIXME
comment) and it is a separate PR, then Mat would have to have excess code
to deal with things that aren't lists. And then when/if SBML becomes list,
mat.py would need to remove excess code.
…On Sat, Jun 11, 2022 at 12:16 AM Christian Diener ***@***.***> wrote:
To be clear my comments are *only* aimed at the JSON test models you are
changing. In the mat parsing you are free to return them as you wish. Also
why are you changing almost all other test models in this PR (like pickle
and SBML ones as well). What do those have to do with the Matlab interface?
—
Reply to this email directly, view it on GitHub
<#1220 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQYYZVSSR2IPV3OL6A2SUDVOQHIXANCNFSM5VRSJCIA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Codecov Report
@@ Coverage Diff @@
## devel #1220 +/- ##
==========================================
- Coverage 84.59% 84.33% -0.26%
==========================================
Files 66 66
Lines 5491 5509 +18
Branches 1264 1268 +4
==========================================
+ Hits 4645 4646 +1
- Misses 545 557 +12
- Partials 301 306 +5
Continue to review full report at Codecov.
|
Okay, I'm onboard with everything except for the following which I'd like to discuss:
In general that make sense, however that would be an API breaking change so we would need to release a new version for that and it would not be backwards compatible. Since there is another PR lined up that will add new annotation formatting and will need a new version as well I would skip it in favour of the FBC3 PR.
Subsystems are deprecated in favor of groups and should only read in the formats that still have them like legacy SBML and Matlab. So I would not convert them here. This info would be dropped in my book since it's data in groups that the Matlab format does not support. I get that this breaks roundtripping and other may think different. My worry would be what would happen if I have "subsystem" group that uses advanced features like nested groups. How should that act? And the unmentioned changes in |
Other PR for everything else that isn't mat, or this PR? |
what would happen if I have "subsystem" group that uses advanced features like nested groups. How should that act? The groups are only exported for reactions. If reactions have the attribute "subsystem" and groups, "subsystem" will win and be exported, "groups" will be dropped. So if you have something like The subsystems when exported would look like |
I think in general it's easier to review several smaller Pars than a huge one. However, since we already discussed a lot of the things here let's just continue as it is. If you want just use a more descriptive title. Sorry I misunderstood the subsystem thing. Your solution makes sense. |
Okay. So I think everything discussed is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Some minor comments and a doubt about duplicated models files.
"ncbigi": [ | ||
"GI:1208453", | ||
"GI:1652654" | ||
"ncbiprotein": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between the mini.json in src/cobra/data
and the one in tests/data
. Couldn't the tests just use the first one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None.
The tests are using the one in tests, because that's how the functions are designed. The ones in src/cobra/data are example files for users installing cobra without development and without tests.
If we want to have tests rely on the files in src/cobra/data we can rewrite tests and update_pickles.py, which I'd be happy to do in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me 👍
src/cobra/io/mat.py
Outdated
@@ -856,7 +903,9 @@ def from_mat_struct( | |||
rxn_group_names = set(rxn_subsystems).difference({None}) | |||
new_groups = [] | |||
for g_name in sorted(rxn_group_names): | |||
group_members = model.reactions.query(lambda x: x.subsystem == g_name) | |||
group_members = model.reactions.query( | |||
lambda x, g_n=g_name: x.subsystem == g_n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to define default values for a lambda function. The previous one looks better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. My linter complained about it. Reverted.
@@ -598,7 +598,7 @@ def _sbml_to_model( | |||
if not libsbml.SyntaxChecker.isValidSBMLSId(model_id): | |||
LOGGER.error(f"'{model_id}' is not a valid SBML 'SId'.") | |||
cobra_model = Model(model_id) | |||
cobra_model.name = model.getName() | |||
cobra_model.name = model.getName() or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above you marked SBML returning a ''
as model name by default as TODO but it seems like this fixes that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This does fix it. I can remove the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'll merge after the typo is fixed.
Awesome, thanks so much for your patience! |
This does some minor changes to SBML, mostly that annotations (and others) will be loaded as lists. It also changes the example models a bit. The goal is to make sure that XML that is written to MATLAB, then reread will be identical to the original SBML loaded.
Fixes some of the FIXMEs in sbml.py saying "Should be al list"