feat: gene name compliance with cobrapy #216

BenjaSanchez · 2020-04-21T14:39:58Z

Main improvements in this PR:

This partly addresses #102: After the merge of opencobra/cobrapy#685, every component in yeast-GEM is properly parsed when loaded into cobrapy, with the exception of gene names (notebook with the 1st analysis here): this is because gene names are stored by cobratoolbox in the fbc:label field in the xml file, whereas cobrapy looks for these in the fbc:name field. So instead, I've adapted the Matlab saving wrapper so that it copies model.geneNames onto model.proteins, as that is translated to the compliant fbc:name field. By doing this we won't loose any information, as in model.proteins we had placeholders until now. The only downside is that the xml will now have duplicated info:

<fbc:geneProduct metaid="G_YPL078C" fbc:id="G_YPL078C" fbc:name="ATP4" fbc:label="ATP4"/>

But as we now over-write the protein info with the wrapper, and cobrapy only reads one of them, we should not have any issues. Also, from now on no need to redefine each time the model.proteins field @feiranl.

I hereby confirm that I have:

Tested my code with all requirements for running the model
Selected devel as a target branch (top left drop-down menu)

test if all model components are loaded when cobrapy is used

for compliance with cobrapy

BenjaSanchez · 2020-04-23T09:09:54Z

@Midnighter do you foresee any potential issues with this decision? were you aware of this difference between cobrapy and cobratoolbox on how the gene names are stored in the xml file?

Midnighter · 2020-04-23T09:19:19Z

I was not aware of this difference. Taking a look at the FBC specification, the label field is a required attribute so practically it seems like the better choice. Since it is required, cobrapy must also set it when writing to SBML, though, doesn't it?

Can you please raise an issue on cobrapy about it?

BenjaSanchez · 2020-04-23T10:15:13Z

@Midnighter it does indeed set it, just equal to id from what I see in other models. I believe what cobrapy does currently (storing the gene name in the name field) is the better suited choice, and therefore this issue should maybe be raised on cobratoolbox instead. But let me know if you have a different view.

edkerk · 2020-04-23T11:39:09Z

IMHO, the proposed solution here should just be a workaround, as geneNames and proteins are not necessarily the same thing (alternative splicing can give different proteins). To solve this, either COBRA Toolbox or cobrapy should modify their code. FYI, RAVEN also stores it in label, to follow compatibility with COBRA toolbox.

BenjaSanchez · 2020-04-23T11:51:57Z

@edkerk I agree, this is a temporal workaround to guarantee that cobrapy users can get all model fields they need. The permanent solution is to have cobratoolbox and cobrapy agree with one format, and as soon as that happens we can prescind from this change and entirely remove the proteins field (which holds no info atm). I can open an issue in our repo to remember to do that. Let me know of you think another approach would be better.

Midnighter · 2020-04-23T12:56:06Z

From my perspective it's a good idea to coordinate on this. I don't really care if it's an issue on the toolbox or on cobrapy.

edkerk · 2020-04-23T13:38:12Z

Following @Midnighter's earlier comment, label is required, which means that it might be best if cobrapy adjusts to adhere to this?

BenjaSanchez · 2020-04-23T15:35:09Z

@edkerk I have now opened opencobra/cobrapy#950 for discussion and #217 as a to-do in this repo once a solution is found. If anything else is needed let me know in a review :)

feiranl

Great!

PR #216 created conflicts in master

BenjaSanchez added 2 commits April 21, 2020 15:50

test: model properly loaded by cobrapy

52c24ab

test if all model components are loaded when cobrapy is used

fix: replace proteins with geneNames

434d240

for compliance with cobrapy

BenjaSanchez added the enhancement new field/feature label Apr 21, 2020

BenjaSanchez requested review from edkerk and feiranl April 21, 2020 14:39

This was referenced Apr 23, 2020

fix: inconsistency in the gene name storage in SBML files opencobra/cobrapy#950

Open

fix: avoid duplicity in gene name storage #217

Open

edkerk approved these changes Apr 23, 2020

View reviewed changes

feiranl approved these changes Apr 23, 2020

View reviewed changes

BenjaSanchez merged commit 4722c45 into devel Apr 24, 2020

BenjaSanchez deleted the fix/gene-names branch April 24, 2020 08:09

BenjaSanchez mentioned this pull request Apr 24, 2020

fix: compatibility with cobrapy #102

Closed

31 tasks

BenjaSanchez added a commit that referenced this pull request May 20, 2020

fix: changes to .gitignore

219203e

PR #216 created conflicts in master

BenjaSanchez mentioned this pull request Jun 12, 2020

yeast 8.4.0 #229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: gene name compliance with cobrapy #216

feat: gene name compliance with cobrapy #216

BenjaSanchez commented Apr 21, 2020

BenjaSanchez commented Apr 23, 2020

Midnighter commented Apr 23, 2020

BenjaSanchez commented Apr 23, 2020 •

edited

Loading

edkerk commented Apr 23, 2020

BenjaSanchez commented Apr 23, 2020

Midnighter commented Apr 23, 2020

edkerk commented Apr 23, 2020

BenjaSanchez commented Apr 23, 2020

feiranl left a comment

feat: gene name compliance with cobrapy #216

feat: gene name compliance with cobrapy #216

Conversation

BenjaSanchez commented Apr 21, 2020

Main improvements in this PR:

BenjaSanchez commented Apr 23, 2020

Midnighter commented Apr 23, 2020

BenjaSanchez commented Apr 23, 2020 • edited Loading

edkerk commented Apr 23, 2020

BenjaSanchez commented Apr 23, 2020

Midnighter commented Apr 23, 2020

edkerk commented Apr 23, 2020

BenjaSanchez commented Apr 23, 2020

feiranl left a comment

Choose a reason for hiding this comment

BenjaSanchez commented Apr 23, 2020 •

edited

Loading