'raw' gather report should output all PU repeats #884

frannerin · 2024-07-05T09:48:45Z

Function _parse_raw_units in openfecli/commands/gather.py only processed first ProtocolUnit repeat and then _write_raw did not distinguish between repeats if there were any. _write_dg_raw is left unchanged because it seems to be unused.

Example output (openfe gather --report raw ) before fix for a RBFE protocol with 3 PU repeats for the solvent legs, and 1 repeat for the complex legs:

leg	ligand_i	ligand_j	DG(i->j) (kcal/mol)	MBAR uncertainty (kcal/mol)
complex	lig_ejm_31	lig_ejm_42	-14.47	0.04
solvent	lig_ejm_31	lig_ejm_42	-14.89	0.03
complex	lig_ejm_31	lig_ejm_46	-33.90	0.05
solvent	lig_ejm_31	lig_ejm_46	-32.75	0.04
complex	lig_ejm_31	lig_ejm_47	-20.77	0.07
solvent	lig_ejm_31	lig_ejm_47	-20.72	0.05
complex	lig_ejm_31	lig_ejm_48	-14.19	0.07
solvent	lig_ejm_31	lig_ejm_48	-14.48	0.06
complex	lig_ejm_31	lig_ejm_50	-49.55	0.04
solvent	lig_ejm_31	lig_ejm_50	-50.07	0.04
complex	lig_ejm_42	lig_ejm_43	-11.47	0.05
solvent	lig_ejm_42	lig_ejm_43	-12.95	0.03
complex	lig_ejm_46	lig_jmc_23	8.96	0.02
solvent	lig_ejm_46	lig_jmc_23	9.37	0.02
complex	lig_ejm_46	lig_jmc_27	10.54	0.03
solvent	lig_ejm_46	lig_jmc_27	11.03	0.03
complex	lig_ejm_46	lig_jmc_28	16.03	0.03
solvent	lig_ejm_46	lig_jmc_28	16.16	0.03

Example output after fix:

leg	repeat	ligand_i	ligand_j	DG(i->j) (kcal/mol)	MBAR uncertainty (kcal/mol)
complex	1	lig_ejm_31	lig_ejm_42	-14.47	0.04
solvent	1	lig_ejm_31	lig_ejm_42	-14.89	0.03
solvent	2	lig_ejm_31	lig_ejm_42	-14.91	0.03
solvent	3	lig_ejm_31	lig_ejm_42	-14.90	0.03
complex	1	lig_ejm_31	lig_ejm_46	-33.90	0.05
solvent	1	lig_ejm_31	lig_ejm_46	-32.75	0.04
solvent	2	lig_ejm_31	lig_ejm_46	-32.75	0.05
solvent	3	lig_ejm_31	lig_ejm_46	-32.78	0.04
complex	1	lig_ejm_31	lig_ejm_47	-20.77	0.07
solvent	1	lig_ejm_31	lig_ejm_47	-20.72	0.05
solvent	2	lig_ejm_31	lig_ejm_47	-20.78	0.05
solvent	3	lig_ejm_31	lig_ejm_47	-20.75	0.05
complex	1	lig_ejm_31	lig_ejm_48	-14.19	0.07
solvent	1	lig_ejm_31	lig_ejm_48	-14.48	0.06
solvent	2	lig_ejm_31	lig_ejm_48	-14.51	0.06
solvent	3	lig_ejm_31	lig_ejm_48	-14.50	0.06
complex	1	lig_ejm_31	lig_ejm_50	-49.55	0.04
solvent	1	lig_ejm_31	lig_ejm_50	-50.07	0.04
solvent	2	lig_ejm_31	lig_ejm_50	-49.97	0.04
solvent	3	lig_ejm_31	lig_ejm_50	-50.08	0.04
complex	1	lig_ejm_42	lig_ejm_43	-11.47	0.05
solvent	1	lig_ejm_42	lig_ejm_43	-12.95	0.03
solvent	2	lig_ejm_42	lig_ejm_43	-13.03	0.03
solvent	3	lig_ejm_42	lig_ejm_43	-13.01	0.03
complex	1	lig_ejm_46	lig_jmc_23	8.96	0.02
solvent	1	lig_ejm_46	lig_jmc_23	9.37	0.02
solvent	2	lig_ejm_46	lig_jmc_23	9.35	0.02
solvent	3	lig_ejm_46	lig_jmc_23	9.37	0.02
complex	1	lig_ejm_46	lig_jmc_27	10.54	0.03
solvent	1	lig_ejm_46	lig_jmc_27	11.03	0.03
solvent	2	lig_ejm_46	lig_jmc_27	11.09	0.03
solvent	3	lig_ejm_46	lig_jmc_27	11.08	0.03
complex	1	lig_ejm_46	lig_jmc_28	16.03	0.03
solvent	1	lig_ejm_46	lig_jmc_28	16.16	0.03
solvent	2	lig_ejm_46	lig_jmc_28	16.17	0.04
solvent	3	lig_ejm_46	lig_jmc_28	16.20	0.03

Checklist

Added a news entry

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

IAlibay

Thanks @frannerin - just a couple of things.

I think there's that one test that will fail and will require manually adding in the repeats to the stored table string.

IAlibay · 2024-07-05T10:39:05Z

openfecli/commands/gather.py

-    return [(pu['outputs']['unit_estimate'],
-             pu['outputs']['unit_estimate_error'])
+    # could add to each tuple pu[0]["source_key"] for repeat ID
+    return [(pu[0]['outputs']['unit_estimate'],


Could you maybe provide some context for this change?

As far as I remember, the zero list index on line 134 is to deal with avoiding to do it here twice - that being said I could be convinced that this is a better approach with the right context.

list(results['protocol_result']['data'].values()) in line 134 contains an item for each ProtocolUnit, so by taking the zero list index there only the first PU is further processed. Each item in list(results['protocol_result']['data'].values()) is a list itself, with length 1 (afaik), so that's why pu[0] must be done to access the single dictionary within.

IAlibay · 2024-07-05T10:40:24Z

openfecli/commands/gather.py


    for ligpair, vals in sorted(legs.items()):
        for simtype, repeats in sorted(vals.items()):
-            for m, u in repeats:
+            for rep, (m, u) in enumerate(repeats, 1):


I think we'd want this to be zero indexed - I'll have a poke around our other outputs later, but at least on the Python layer, we keep everything zero indexed.

frannerin · 2024-07-05T11:53:21Z

I did the zero-indexed repeats thing and updated the test_gather.py. I ran pytest -k test_gather and got: 23 passed, 3 skipped, 840 deselected, 1 xfailed, 2 xpassed, 25 warnings

codecov · 2024-10-15T21:00:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.64%. Comparing base (83028b1) to head (a1edc4a).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #884      +/-   ##
==========================================
- Coverage   94.59%   91.64%   -2.95%     
==========================================
  Files         134      134              
  Lines        9935     9935              
==========================================
- Hits         9398     9105     -293     
- Misses        537      830     +293

Flag	Coverage Δ
fast-tests	`91.64% <100.00%> (?)`
slow-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

'raw' gather report now outputs all PU repeats

744c49c

IAlibay requested changes Jul 5, 2024

View reviewed changes

frannerin added 2 commits July 5, 2024 12:44

zero-indexed repeats output

351e199

updated test_gather raw to new output

cf4eb72

frannerin mentioned this pull request Aug 19, 2024

Use MBAR error as uncertainty with a single protocol repeat in RBFE #883

Open

2 tasks

mikemhenry added 2 commits September 25, 2024 11:39

Merge branch 'main' into rbfe-raw-report

e3d4c8d

Merge branch 'main' into rbfe-raw-report

a1edc4a

jthorton mentioned this pull request Oct 18, 2024

Trouble reconciling results from json with openfe gather --report raw #925

Open

atravitz mentioned this pull request Oct 18, 2024

fixing indexing to include replicates #967

Open

1 task

atravitz added bug Something isn't working cli command-line interface labels Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'raw' gather report should output all PU repeats #884

'raw' gather report should output all PU repeats #884

frannerin commented Jul 5, 2024 •

edited

Loading

IAlibay left a comment

IAlibay Jul 5, 2024

frannerin Jul 5, 2024

IAlibay Jul 5, 2024

frannerin commented Jul 5, 2024

codecov bot commented Oct 15, 2024 •

edited

Loading

'raw' gather report should output all PU repeats #884

Are you sure you want to change the base?

'raw' gather report should output all PU repeats #884

Conversation

frannerin commented Jul 5, 2024 • edited Loading

Developers certificate of origin

IAlibay left a comment

Choose a reason for hiding this comment

IAlibay Jul 5, 2024

Choose a reason for hiding this comment

frannerin Jul 5, 2024

Choose a reason for hiding this comment

IAlibay Jul 5, 2024

Choose a reason for hiding this comment

frannerin commented Jul 5, 2024

codecov bot commented Oct 15, 2024 • edited Loading

Codecov Report

frannerin commented Jul 5, 2024 •

edited

Loading

codecov bot commented Oct 15, 2024 •

edited

Loading