-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add training codes for NER and format IDP4+ data to create baseline model #6
Comments
added the data prep codes with idp4+ download and conversion as default to set up the baseline model |
@valearna Added the main training script, but I think it looks off compared to the pattern of other codes. It might be a good idea to rearrange this part later, so it's more intuitive to use |
Ok, thanks @rishabgit. I'll also take a look at it |
Refactored training script into an import like data prep |
Can you give me an update on this part @rishabgit ? Is this all in the train_ner.py script? I could add an entry point to the script in the setup.py file so that it gets automatically installed when users install the genomic info package in their system: https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html |
@valearna It's a two-step process. You need to pull and process IDP4+ data, and then run train_ner.py script on it. I added a readme with the commands - https://github.com/rishabgit/genomic-info-from-papers/blob/main/genomicinfo/entity_extraction/ner/readme.md I'm not sure if running these two steps automatically when the user installs this package would a good idea, since then the downloading and training would happen by default for all users even if they plan on not using the ner block at all (TODO feature - #10). Also, in case user has an old CPU with no GPU, then training might take more than an hour (haven't tested exact how long). |
Sounds good, thanks @rishabgit |
Download link
The text was updated successfully, but these errors were encountered: