-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds MLE estimates on gather command #495
Conversation
openfecli/commands/gather.py
Outdated
# 5a) write out MLE values | ||
for ligA, DG, unc_DG in MLEs: | ||
DG, unc_DG = dp2(DG), dp2(unc_DG) | ||
output.write(f'DGbind({ligA})\tDG(MLE)\tZero\t{ligA}\t{DG}\t{unc_DG}\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hannahbaumann this will print a line like:
measurement type ligand_i ligand_j estimate (kcal/mol) uncertainty (kcal/mol)
DGbind(lig_1234) DG(mle) Zero lig_1234 -4.2 0.23
would this make sense to you? "zero" is meant to be the zero state reference point, not sure if there's a better label for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a zero state dG though? Unless you normalise against experiment your reference state is the first ligand passed to your MLE - or are you saying that "Zero" is followed by the reference state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, if it's possible to inspect the original SMCs for stored experimental values this becomes a lot more useful. Otherwise what I would do is return a list of values with the leading entry being the 0 kcal/mol entry. That way it's clear what the central point is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think from that print statement I wouldn't know what zero means, so what the reference is and therefore what meaning the ("relative") dG has. Or is that described elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than trying to cram everything into one table, why don't we make gather
take flags to indicate what data to output? We're really offering 3 different tables. Instead of trying for force them into a single table, why not just own it and offer 3 tables?
Gather is also pretty fast, isn't it? So I don't think we save that much time by only walking the filesystem once.
Something like:
$ openfe gather --report ddg results/
$ openfe gather --report dg results/ # MLE, this is probably the default
$ openfe gather --report leg results/ # raw leg DG
If the theory is that we're expecting someone to parse this after, the current approach requires understanding what the special columns in our format mean, which, IMO, takes more teaching than a CLI option that switches between more intuitive output formats for each type of data.
To implement: https://click.palletsprojects.com/en/8.1.x/options/#choice-options
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #495 +/- ##
==========================================
- Coverage 92.07% 91.90% -0.18%
==========================================
Files 113 116 +3
Lines 6938 6952 +14
==========================================
+ Hits 6388 6389 +1
- Misses 550 563 +13
☔ View full report in Codecov by Sentry. |
Most of the missing coverage would get hit if we had RHFE example data to test. But still is still good enough that codecov isn't complaining. Should be ready for review. |
Raising my own concerns on this as of now:
I'm gone for a week as of now (well, a few of you might be able to get me in the next 13 hours.) So I'm logging these concerns so someone else can carry them through and fix problems in the PR. |
Here a cinnabar plot of the DG values (after centralizing as done in cinnabar). I would say this looks reasonable?! |
make room for MLEs to be at top of output
now to update to make output cleaner
Co-authored-by: Irfan Alibay <[email protected]>
8c36577
to
0b4d51a
Compare
t = tarfile.open(f, mode='r') | ||
t.extractall('.') | ||
|
||
yield | ||
|
||
_EXPECTED_DG = b""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hannahbaumann are you happy with how these three outputs look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks good to me!
@dwhswenson re: there's no ligand at DG=0.0, I think that's an error in the cinnabar MLE docs, I think it returns values with a mean of 0.0, rather than one ligand arbitrarily being 0.0 |
Developers certificate of origin