-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update performance test and README to include more fair comparison with pubchempy #14
Conversation
…slateable using both OPSIN and PubChem
…lightly softened language
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one small note - otherwise I think it's about ready to go! Will need to be updated as well.
README.md
Outdated
## Speedup 50x from `pubchempy` | ||
`py2opsin` runs locally and is smaller in scope in what it provides, which makes it __dramatically__ faster at resolving identifiers. In the code block below, the call to `py2opsin` will execute ~58x faster than an equivalent call to `puchempy`: | ||
## Massive speedup from `pubchempy` for batch translations | ||
`py2opsin` runs locally and is smaller in scope in what it provides, which makes it __dramatically__ faster at resolving identifiers. In the code block below, the call to `py2opsin` will execute ~23 faster than an equivalent call to `pubchempy`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get rid of the numerical comparison here - going to vary based on chemicals, machines, internet speed, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fixed that up in the latest commits
It looks like the performance tests have the possibility of failing if the PubChem server can't be reached. We may want to include error handling |
I think this might actually be a limitation of the GitHub runners - they are notoriously weird. On the second run, it passed. |
Ah, I spoke too soon - great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this one! We will bear in mind the comments made about possible network issues in case this comes up again in the future.
This PR makes two small changes -
Updates the compound list in
test_performance.py
to include only 100 molecules that are known to be translateable using both OPSIN and PubChem. The previous list of compounds had many species that couldn't be resolved using PubChem, resulting in a potentially unfair comparison between the two tools. Also, the compound list was fairly long - ideally we want to keep the test list as short as possible so as to not spam the PubChem server.Updates the README to reflect this change, and modifies the language describing the comparison. In particular, the speedup will depend on the size of the input.
py2opsin
has a non-negligible startup time, and if you are only translating 1 species at a time,py2opsin
performance is actually comparable to querying PubChem. The real speedup comes from the batch translations, which is now indicated in the README.