Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'raw' gather report should output all PU repeats #884

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

frannerin
Copy link

@frannerin frannerin commented Jul 5, 2024

Function _parse_raw_units in openfecli/commands/gather.py only processed first ProtocolUnit repeat and then _write_raw did not distinguish between repeats if there were any. _write_dg_raw is left unchanged because it seems to be unused.

Example output (openfe gather --report raw ) before fix for a RBFE protocol with 3 PU repeats for the solvent legs, and 1 repeat for the complex legs:

leg ligand_i ligand_j DG(i->j) (kcal/mol) MBAR uncertainty (kcal/mol)
complex lig_ejm_31 lig_ejm_42 -14.47 0.04
solvent lig_ejm_31 lig_ejm_42 -14.89 0.03
complex lig_ejm_31 lig_ejm_46 -33.90 0.05
solvent lig_ejm_31 lig_ejm_46 -32.75 0.04
complex lig_ejm_31 lig_ejm_47 -20.77 0.07
solvent lig_ejm_31 lig_ejm_47 -20.72 0.05
complex lig_ejm_31 lig_ejm_48 -14.19 0.07
solvent lig_ejm_31 lig_ejm_48 -14.48 0.06
complex lig_ejm_31 lig_ejm_50 -49.55 0.04
solvent lig_ejm_31 lig_ejm_50 -50.07 0.04
complex lig_ejm_42 lig_ejm_43 -11.47 0.05
solvent lig_ejm_42 lig_ejm_43 -12.95 0.03
complex lig_ejm_46 lig_jmc_23 8.96 0.02
solvent lig_ejm_46 lig_jmc_23 9.37 0.02
complex lig_ejm_46 lig_jmc_27 10.54 0.03
solvent lig_ejm_46 lig_jmc_27 11.03 0.03
complex lig_ejm_46 lig_jmc_28 16.03 0.03
solvent lig_ejm_46 lig_jmc_28 16.16 0.03

Example output after fix:

leg repeat ligand_i ligand_j DG(i->j) (kcal/mol) MBAR uncertainty (kcal/mol)
complex 1 lig_ejm_31 lig_ejm_42 -14.47 0.04
solvent 1 lig_ejm_31 lig_ejm_42 -14.89 0.03
solvent 2 lig_ejm_31 lig_ejm_42 -14.91 0.03
solvent 3 lig_ejm_31 lig_ejm_42 -14.90 0.03
complex 1 lig_ejm_31 lig_ejm_46 -33.90 0.05
solvent 1 lig_ejm_31 lig_ejm_46 -32.75 0.04
solvent 2 lig_ejm_31 lig_ejm_46 -32.75 0.05
solvent 3 lig_ejm_31 lig_ejm_46 -32.78 0.04
complex 1 lig_ejm_31 lig_ejm_47 -20.77 0.07
solvent 1 lig_ejm_31 lig_ejm_47 -20.72 0.05
solvent 2 lig_ejm_31 lig_ejm_47 -20.78 0.05
solvent 3 lig_ejm_31 lig_ejm_47 -20.75 0.05
complex 1 lig_ejm_31 lig_ejm_48 -14.19 0.07
solvent 1 lig_ejm_31 lig_ejm_48 -14.48 0.06
solvent 2 lig_ejm_31 lig_ejm_48 -14.51 0.06
solvent 3 lig_ejm_31 lig_ejm_48 -14.50 0.06
complex 1 lig_ejm_31 lig_ejm_50 -49.55 0.04
solvent 1 lig_ejm_31 lig_ejm_50 -50.07 0.04
solvent 2 lig_ejm_31 lig_ejm_50 -49.97 0.04
solvent 3 lig_ejm_31 lig_ejm_50 -50.08 0.04
complex 1 lig_ejm_42 lig_ejm_43 -11.47 0.05
solvent 1 lig_ejm_42 lig_ejm_43 -12.95 0.03
solvent 2 lig_ejm_42 lig_ejm_43 -13.03 0.03
solvent 3 lig_ejm_42 lig_ejm_43 -13.01 0.03
complex 1 lig_ejm_46 lig_jmc_23 8.96 0.02
solvent 1 lig_ejm_46 lig_jmc_23 9.37 0.02
solvent 2 lig_ejm_46 lig_jmc_23 9.35 0.02
solvent 3 lig_ejm_46 lig_jmc_23 9.37 0.02
complex 1 lig_ejm_46 lig_jmc_27 10.54 0.03
solvent 1 lig_ejm_46 lig_jmc_27 11.03 0.03
solvent 2 lig_ejm_46 lig_jmc_27 11.09 0.03
solvent 3 lig_ejm_46 lig_jmc_27 11.08 0.03
complex 1 lig_ejm_46 lig_jmc_28 16.03 0.03
solvent 1 lig_ejm_46 lig_jmc_28 16.16 0.03
solvent 2 lig_ejm_46 lig_jmc_28 16.17 0.04
solvent 3 lig_ejm_46 lig_jmc_28 16.20 0.03

Checklist

  • Added a news entry

Developers certificate of origin

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @frannerin - just a couple of things.

I think there's that one test that will fail and will require manually adding in the repeats to the stored table string.

return [(pu['outputs']['unit_estimate'],
pu['outputs']['unit_estimate_error'])
# could add to each tuple pu[0]["source_key"] for repeat ID
return [(pu[0]['outputs']['unit_estimate'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you maybe provide some context for this change?

As far as I remember, the zero list index on line 134 is to deal with avoiding to do it here twice - that being said I could be convinced that this is a better approach with the right context.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list(results['protocol_result']['data'].values()) in line 134 contains an item for each ProtocolUnit, so by taking the zero list index there only the first PU is further processed. Each item in list(results['protocol_result']['data'].values()) is a list itself, with length 1 (afaik), so that's why pu[0] must be done to access the single dictionary within.

image


for ligpair, vals in sorted(legs.items()):
for simtype, repeats in sorted(vals.items()):
for m, u in repeats:
for rep, (m, u) in enumerate(repeats, 1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want this to be zero indexed - I'll have a poke around our other outputs later, but at least on the Python layer, we keep everything zero indexed.

@frannerin
Copy link
Author

I did the zero-indexed repeats thing and updated the test_gather.py. I ran pytest -k test_gather and got: 23 passed, 3 skipped, 840 deselected, 1 xfailed, 2 xpassed, 25 warnings

Copy link

codecov bot commented Oct 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.64%. Comparing base (83028b1) to head (a1edc4a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #884      +/-   ##
==========================================
- Coverage   94.59%   91.64%   -2.95%     
==========================================
  Files         134      134              
  Lines        9935     9935              
==========================================
- Hits         9398     9105     -293     
- Misses        537      830     +293     
Flag Coverage Δ
fast-tests 91.64% <100.00%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@atravitz atravitz added bug Something isn't working cli command-line interface labels Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cli command-line interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants