Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add training codes for NER and format IDP4+ data to create baseline model #6

Open
rishabgit opened this issue Oct 8, 2021 · 7 comments
Assignees

Comments

@rishabgit
Copy link
Owner

Download link

@rishabgit rishabgit self-assigned this Oct 8, 2021
@rishabgit
Copy link
Owner Author

added the data prep codes with idp4+ download and conversion as default to set up the baseline model

@rishabgit
Copy link
Owner Author

@valearna Added the main training script, but I think it looks off compared to the pattern of other codes. It might be a good idea to rearrange this part later, so it's more intuitive to use

@valearna
Copy link
Collaborator

Ok, thanks @rishabgit. I'll also take a look at it

@rishabgit
Copy link
Owner Author

rishabgit commented Oct 24, 2021

Refactored training script into an import like data prep

@rishabgit rishabgit reopened this Oct 24, 2021
@valearna
Copy link
Collaborator

valearna commented Nov 1, 2021

Can you give me an update on this part @rishabgit ? Is this all in the train_ner.py script? I could add an entry point to the script in the setup.py file so that it gets automatically installed when users install the genomic info package in their system: https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html

@rishabgit
Copy link
Owner Author

@valearna It's a two-step process. You need to pull and process IDP4+ data, and then run train_ner.py script on it.

I added a readme with the commands - https://github.com/rishabgit/genomic-info-from-papers/blob/main/genomicinfo/entity_extraction/ner/readme.md

I'm not sure if running these two steps automatically when the user installs this package would a good idea, since then the downloading and training would happen by default for all users even if they plan on not using the ner block at all (TODO feature - #10). Also, in case user has an old CPU with no GPU, then training might take more than an hour (haven't tested exact how long).

@valearna
Copy link
Collaborator

Sounds good, thanks @rishabgit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants