Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds MLE estimates on gather command #495

Merged
merged 16 commits into from
Aug 24, 2023
Merged

Adds MLE estimates on gather command #495

merged 16 commits into from
Aug 24, 2023

Conversation

richardjgowers
Copy link
Contributor

@richardjgowers richardjgowers commented Jul 19, 2023

Developers certificate of origin

@richardjgowers richardjgowers changed the title Mle on gather [wip] Mle on gather Jul 19, 2023
@richardjgowers richardjgowers changed the title [wip] Mle on gather Adds MLE estimates on gather command Jul 20, 2023
# 5a) write out MLE values
for ligA, DG, unc_DG in MLEs:
DG, unc_DG = dp2(DG), dp2(unc_DG)
output.write(f'DGbind({ligA})\tDG(MLE)\tZero\t{ligA}\t{DG}\t{unc_DG}\n')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannahbaumann this will print a line like:

measurement type ligand_i ligand_j estimate (kcal/mol) uncertainty (kcal/mol)
DGbind(lig_1234) DG(mle) Zero lig_1234 -4.2 0.23

would this make sense to you? "zero" is meant to be the zero state reference point, not sure if there's a better label for that

Copy link
Member

@IAlibay IAlibay Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a zero state dG though? Unless you normalise against experiment your reference state is the first ligand passed to your MLE - or are you saying that "Zero" is followed by the reference state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, if it's possible to inspect the original SMCs for stored experimental values this becomes a lot more useful. Otherwise what I would do is return a list of values with the leading entry being the 0 kcal/mol entry. That way it's clear what the central point is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think from that print statement I wouldn't know what zero means, so what the reference is and therefore what meaning the ("relative") dG has. Or is that described elsewhere?

Copy link
Member

@dwhswenson dwhswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than trying to cram everything into one table, why don't we make gather take flags to indicate what data to output? We're really offering 3 different tables. Instead of trying for force them into a single table, why not just own it and offer 3 tables?

Gather is also pretty fast, isn't it? So I don't think we save that much time by only walking the filesystem once.

Something like:

$ openfe gather --report ddg results/
$ openfe gather --report dg results/  # MLE, this is probably the default
$ openfe gather --report leg results/  # raw leg DG

If the theory is that we're expecting someone to parse this after, the current approach requires understanding what the special columns in our format mean, which, IMO, takes more teaching than a CLI option that switches between more intuitive output formats for each type of data.

To implement: https://click.palletsprojects.com/en/8.1.x/options/#choice-options

@dwhswenson dwhswenson mentioned this pull request Jul 24, 2023
1 task
@IAlibay IAlibay added this to the Release 0.12.0 milestone Jul 31, 2023
@codecov
Copy link

codecov bot commented Aug 2, 2023

Codecov Report

Patch coverage: 93.10% and project coverage change: -0.18% ⚠️

Comparison is base (e797d52) 92.07% compared to head (8c36577) 91.90%.
Report is 1 commits behind head on main.

❗ Current head 8c36577 differs from pull request most recent head 0b4d51a. Consider uploading reports for the commit 0b4d51a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #495      +/-   ##
==========================================
- Coverage   92.07%   91.90%   -0.18%     
==========================================
  Files         113      116       +3     
  Lines        6938     6952      +14     
==========================================
+ Hits         6388     6389       +1     
- Misses        550      563      +13     
Files Changed Coverage Δ
openfecli/commands/gather.py 85.25% <90.19%> (-6.79%) ⬇️
openfecli/clicktypes/__init__.py 100.00% <100.00%> (ø)
openfecli/clicktypes/hyphenchoice.py 100.00% <100.00%> (ø)
openfecli/tests/clicktypes/test_hyphenchoice.py 100.00% <100.00%> (ø)
openfecli/tests/commands/test_gather.py 100.00% <100.00%> (ø)
openfecli/tests/test_rbfe_tutorial.py 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dwhswenson
Copy link
Member

Most of the missing coverage would get hit if we had RHFE example data to test. But still is still good enough that codecov isn't complaining. Should be ready for review.

@dwhswenson
Copy link
Member

dwhswenson commented Aug 4, 2023

Raising my own concerns on this as of now:

  • in the MLE DG estimate, reference ligand doesn't seem to show exactly 0.0 value. This makes things ambiguous and suggests a problem. I'm highly concerned about this.
  • I haven't seen anyone do a proper cinnabar plot of numbers from this vs. experiment. There's no check in anything in this PR to ensure that our numbers are actually accurate.

I'm gone for a week as of now (well, a few of you might be able to get me in the next 13 hours.) So I'm logging these concerns so someone else can carry them through and fix problems in the PR.

@hannahbaumann
Copy link
Contributor

hannahbaumann commented Aug 4, 2023

Here a cinnabar plot of the DG values (after centralizing as done in cinnabar). I would say this looks reasonable?!
https://drive.google.com/file/d/1VJnrpbIOIRjDRS_2nrOhEMqLs71Fj4K2/view?usp=sharing
This was created using the data from the opencli directory

t = tarfile.open(f, mode='r')
t.extractall('.')

yield

_EXPECTED_DG = b"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannahbaumann are you happy with how these three outputs look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that looks good to me!

@richardjgowers
Copy link
Contributor Author

@dwhswenson re: there's no ligand at DG=0.0, I think that's an error in the cinnabar MLE docs, I think it returns values with a mean of 0.0, rather than one ligand arbitrarily being 0.0

@richardjgowers richardjgowers merged commit b027cff into main Aug 24, 2023
6 of 7 checks passed
@richardjgowers richardjgowers deleted the MLE_on_gather branch August 24, 2023 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants