Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OptimizationResultCollection.create_basic_dataset to preserve molecule IDs #303

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ntBre
Copy link
Contributor

@ntBre ntBre commented Oct 15, 2024

Description

The goal of this PR is a long-term fix to #297. In short, the current OptimizationResultCollection.create_basic_dataset implementation creates a new dataset by round-tripping through OpenFF Molecules, which causes issues with the hashing/equivalence checks in QCArchive, leading to "the same molecules" being considered different. In turn, this causes OptimizationResultCollection.to_basic_result_collection to return fewer records than expected.

The fix is simply to pass qcelemental Molecule objects along directly instead of reconstructing OpenFF Molecules. As it turns out, this was essentially already built into qcsubmit and just required a two-line change to update some keyword arguments in the BasicDataset.add_molecule call.

Todos

  • Update create_basic_dataset to preserve molecule hashes from the optimization dataset

Status

  • Ready to go

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.43%. Comparing base (cc3f23e) to head (bd3bf3b).

Additional details and impacted files

@ntBre
Copy link
Contributor Author

ntBre commented Oct 17, 2024

Well, that turned out to be a pretty easy fix. The hardest part was adding a new mock get_molecules method to the _PortalClient test fixture. The main limitation now is just that the new test is pulling down a full dataset from a live QCArchive instance. I think I can probably pull out a small subset of the real dataset to speed that up at least.

@ntBre ntBre marked this pull request as ready for review October 17, 2024 19:52
@ntBre ntBre requested a review from j-wags October 17, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant