-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: single ec-code annotation per reaction #319
Comments
This is a good idea. It should be also noted that the reaction is mainly found based on sequence, not by EC number as one EC number could be mapped mutiple reactions(in some cases). Currently, EC number for each protein can be found from UniProt or SGD. Previously I found the EC number annotation from different database is not the same. We can still make this step automatic by refererring some standard rxn database like Rhea, Metnetx or ModelSeed. |
But by definition, there should really be only one EC number per reaction, although perhaps with some wildcards if the exact substrate or cofactor has not been given a specific EC number (in that case, something like EC1.4.2.-). But indeed, we can automatically pull these from databases, probably MetaNetX will be useful. Manual curation will be required to resolve when we find multiple EC numbers for the same reaction. |
I fully support/encourage automatic mapping of EC numbers. A good way to map to MNX would be via Rhea IDs; however, these are missing from the model annotation, so then I would suggest using the BridgeDB API. |
I just checked EC number for each reaction in yeast-GEM. Most reactions have one unique EC number. About 490 reactions have multiple EC number. It will be wonderful to have some automatic way to curate these EC number. |
Indeed, there are many that are already fine. I'm not 100% certain how automated curation would work, because many gene/protein based approaches (e.g. BridgeDB, Uniprot) are not ideal as it is not uncommon for enzymes to have been assigned multiple ec-numbers (due to them being multifunctional enzymes), or because they are part of a complex where subunits perform dedicated functions. GECKO is however doing this based on Uniprot and KEGG. But probably a mixture of using GECKO's function, supplemented with looking at other annotations. And most likely we'd manually have to define EC numbers with wild-cards for those reactions that have not specific EC number (we want the EC-numbers to exactly match the reaction: if it doesn't match, use a wild-card, until all reactions (except exchange and non-active transport) have assigned (partial) EC-number). This all does not have very high priority, but would be good to address. |
Description of the issue:
Many reactions do not have any ec-code, or are annotated with multiple ec-codes. Instead, each reaction should be annotated with one ec-code, and if not full ec-code can be defined (with 4 sets of digits), then wild-cards can be given.
This might involve manual curation, although parsing gene associations through Uniprot might be helpful.
These single ec-codes can then be used when constructing GECKO models.
Expected feature/value/output:
model.eccodes(:)
should give a single ec-code entry for each reaction.Current feature/value/output:
I hereby confirm that I have:
main
branch of the repositoryThe text was updated successfully, but these errors were encountered: