Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update performance test and README to include more fair comparison with pubchempy #14

Merged
merged 5 commits into from
Jul 10, 2023

Conversation

jonwzheng
Copy link
Collaborator

This PR makes two small changes -

  1. Updates the compound list in test_performance.py to include only 100 molecules that are known to be translateable using both OPSIN and PubChem. The previous list of compounds had many species that couldn't be resolved using PubChem, resulting in a potentially unfair comparison between the two tools. Also, the compound list was fairly long - ideally we want to keep the test list as short as possible so as to not spam the PubChem server.

  2. Updates the README to reflect this change, and modifies the language describing the comparison. In particular, the speedup will depend on the size of the input. py2opsin has a non-negligible startup time, and if you are only translating 1 species at a time, py2opsin performance is actually comparable to querying PubChem. The real speedup comes from the batch translations, which is now indicated in the README.

Copy link
Owner

@JacksonBurns JacksonBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small note - otherwise I think it's about ready to go! Will need to be updated as well.

README.md Outdated
## Speedup 50x from `pubchempy`
`py2opsin` runs locally and is smaller in scope in what it provides, which makes it __dramatically__ faster at resolving identifiers. In the code block below, the call to `py2opsin` will execute ~58x faster than an equivalent call to `puchempy`:
## Massive speedup from `pubchempy` for batch translations
`py2opsin` runs locally and is smaller in scope in what it provides, which makes it __dramatically__ faster at resolving identifiers. In the code block below, the call to `py2opsin` will execute ~23 faster than an equivalent call to `pubchempy`:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of the numerical comparison here - going to vary based on chemicals, machines, internet speed, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fixed that up in the latest commits

test/test_performance.py Show resolved Hide resolved
@jonwzheng
Copy link
Collaborator Author

It looks like the performance tests have the possibility of failing if the PubChem server can't be reached. We may want to include error handling
This might happen if too many queries are sent in a short time (PubChem recommends limiting access to 5 queries per second)... but making that limitation might contaminate the speed comparison.

@JacksonBurns
Copy link
Owner

I think this might actually be a limitation of the GitHub runners - they are notoriously weird. On the second run, it passed.

@jonwzheng
Copy link
Collaborator Author

Ah, I spoke too soon - great!

@JacksonBurns JacksonBurns self-requested a review July 10, 2023 16:53
Copy link
Owner

@JacksonBurns JacksonBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this one! We will bear in mind the comments made about possible network issues in case this comes up again in the future.

@JacksonBurns JacksonBurns merged commit 8743e9d into main Jul 10, 2023
17 checks passed
@JacksonBurns JacksonBurns deleted the update_performance_test branch July 10, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants