Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: single ec-code annotation per reaction #319

Open
3 of 4 tasks
edkerk opened this issue Jun 5, 2022 · 5 comments
Open
3 of 4 tasks

feat: single ec-code annotation per reaction #319

edkerk opened this issue Jun 5, 2022 · 5 comments
Assignees

Comments

@edkerk
Copy link
Member

edkerk commented Jun 5, 2022

Description of the issue:

Many reactions do not have any ec-code, or are annotated with multiple ec-codes. Instead, each reaction should be annotated with one ec-code, and if not full ec-code can be defined (with 4 sets of digits), then wild-cards can be given.

This might involve manual curation, although parsing gene associations through Uniprot might be helpful.

These single ec-codes can then be used when constructing GECKO models.

Expected feature/value/output:

model.eccodes(:) should give a single ec-code entry for each reaction.

Current feature/value/output:

>> model.eccodes([1,3,6,17,22]) % Some random example reactions)

ans =

  5×1 cell array

    {'1.1.2.4;1.1.99.-'                              }
    {'1.1.1.4'                                       }
    {0×0 char                                        }
    {'1.14.13.-;2.1.1.114;2.1.1.201;2.1.1.64;2.7.-.-'}
    {0×0 char                                        }

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the main branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue
@hongzhonglu
Copy link
Collaborator

hongzhonglu commented Jun 6, 2022

This is a good idea. It should be also noted that the reaction is mainly found based on sequence, not by EC number as one EC number could be mapped mutiple reactions(in some cases). Currently, EC number for each protein can be found from UniProt or SGD. Previously I found the EC number annotation from different database is not the same. We can still make this step automatic by refererring some standard rxn database like Rhea, Metnetx or ModelSeed.

@edkerk
Copy link
Member Author

edkerk commented Jun 6, 2022

But by definition, there should really be only one EC number per reaction, although perhaps with some wildcards if the exact substrate or cofactor has not been given a specific EC number (in that case, something like EC1.4.2.-). But indeed, we can automatically pull these from databases, probably MetaNetX will be useful. Manual curation will be required to resolve when we find multiple EC numbers for the same reaction.

@edkerk edkerk self-assigned this Jun 7, 2022
@mihai-sysbio
Copy link
Member

I fully support/encourage automatic mapping of EC numbers. A good way to map to MNX would be via Rhea IDs; however, these are missing from the model annotation, so then I would suggest using the BridgeDB API.

@hongzhonglu
Copy link
Collaborator

hongzhonglu commented Jun 9, 2022

I just checked EC number for each reaction in yeast-GEM. Most reactions have one unique EC number. About 490 reactions have multiple EC number. It will be wonderful to have some automatic way to curate these EC number.

@edkerk
Copy link
Member Author

edkerk commented Jun 9, 2022

Indeed, there are many that are already fine.

I'm not 100% certain how automated curation would work, because many gene/protein based approaches (e.g. BridgeDB, Uniprot) are not ideal as it is not uncommon for enzymes to have been assigned multiple ec-numbers (due to them being multifunctional enzymes), or because they are part of a complex where subunits perform dedicated functions. GECKO is however doing this based on Uniprot and KEGG.

But probably a mixture of using GECKO's function, supplemented with looking at other annotations. And most likely we'd manually have to define EC numbers with wild-cards for those reactions that have not specific EC number (we want the EC-numbers to exactly match the reaction: if it doesn't match, use a wild-card, until all reactions (except exchange and non-active transport) have assigned (partial) EC-number).

This all does not have very high priority, but would be good to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants