Update `OptimizationResultCollection.create_basic_dataset` to preserve molecule IDs #303

ntBre · 2024-10-15T21:58:52Z

Description

The goal of this PR is a long-term fix to #297. In short, the current OptimizationResultCollection.create_basic_dataset implementation creates a new dataset by round-tripping through OpenFF Molecules, which causes issues with the hashing/equivalence checks in QCArchive, leading to "the same molecules" being considered different. In turn, this causes OptimizationResultCollection.to_basic_result_collection to return fewer records than expected.

The fix is simply to pass qcelemental Molecule objects along directly instead of reconstructing OpenFF Molecules. As it turns out, this was essentially already built into qcsubmit and just required a two-line change to update some keyword arguments in the BasicDataset.add_molecule call.

Todos

Update create_basic_dataset to preserve molecule hashes from the optimization dataset

Status

Ready to go

codecov · 2024-10-16T19:57:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.43%. Comparing base (cc3f23e) to head (bd3bf3b).

Additional details and impacted files

ntBre · 2024-10-17T18:27:49Z

Well, that turned out to be a pretty easy fix. The hardest part was adding a new mock get_molecules method to the _PortalClient test fixture. The main limitation now is just that the new test is pulling down a full dataset from a live QCArchive instance. I think I can probably pull out a small subset of the real dataset to speed that up at least.

ntBre added 4 commits October 15, 2024 17:40

add failing test case based on #297

556853f

fix test for #297 but break other create_basic_dataset test

42a711a

mock get_molecules for test_optimization_create_basic_dataset

d4cfcb2

update doc string for new test

2e2e3d9

minimize example dataset from #297

bd3bf3b

ntBre marked this pull request as ready for review October 17, 2024 19:52

ntBre requested a review from j-wags October 17, 2024 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `OptimizationResultCollection.create_basic_dataset` to preserve molecule IDs #303

Update `OptimizationResultCollection.create_basic_dataset` to preserve molecule IDs #303

ntBre commented Oct 15, 2024 •

edited

Loading

codecov bot commented Oct 16, 2024 •

edited

Loading

ntBre commented Oct 17, 2024

Update OptimizationResultCollection.create_basic_dataset to preserve molecule IDs #303

Are you sure you want to change the base?

Update OptimizationResultCollection.create_basic_dataset to preserve molecule IDs #303

Conversation

ntBre commented Oct 15, 2024 • edited Loading

Description

Todos

Status

codecov bot commented Oct 16, 2024 • edited Loading

Codecov Report

ntBre commented Oct 17, 2024

Update `OptimizationResultCollection.create_basic_dataset` to preserve molecule IDs #303

Update `OptimizationResultCollection.create_basic_dataset` to preserve molecule IDs #303

ntBre commented Oct 15, 2024 •

edited

Loading

codecov bot commented Oct 16, 2024 •

edited

Loading