-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use revised Pareto k threshold #2349
Conversation
"good_k" value is computed once and stored in the ELPDData. I am open to suggestions for a better name. |
if isinstance(khats, ELPDData): | ||
good_k = khats.good_k | ||
khats = khats.pareto_k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(khats, ELPDData):
good_k = khats.good_k
khats = khats.pareto_k
else:
good_k = None
warnings.warn()
This should be something like this instead. Right now, dataarrays are also valid input as they are array-like, but we have more info than we do for numpy arrays, so they are treated more similarly to elpddata input. I think this is a reason for some of the test failures.
Note: also rebase on main to avoid unrelated failures that have been fixed already.
arviz/stats/stats.py
Outdated
"importance sampling is less likely to work well if the marginal posterior and " | ||
"LOO posterior are very different. This is more likely to happen with a non-robust " | ||
"model and highly influential observations." | ||
f"Estimated shape parameter of Pareto distribution is greater than {good_k:.1f} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably show 2 digits at least, otherwise it might be confusing to people when checking the plot so see why 0.68 is counted as bad if the printed value is 0.7 (but the good_k value is actually 0.66). The plot generally allows for more than one digit precision lookouts, all plots in https://python.arviz.org/en/stable/api/generated/arviz.plot_khat.html for example do.
arviz/stats/stats_utils.py
Outdated
(0.5, 0.7] (ok) {{3:{0}d}} {{7:6.1f}}% | ||
(0.7, 1] (bad) {{4:{0}d}} {{8:6.1f}}% | ||
(1, Inf) (very bad) {{5:{0}d}} {{9:6.1f}}% | ||
(-Inf, {{8:.1f}}] (good) {{2:{0}d}} {{5:6.1f}}% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
khats = khats.pareto_k | ||
if not isinstance(khats, DataArray): | ||
raise ValueError("Incorrect khat data input. Check the documentation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The valueerror is a different if altogether, not the else branch of the one above. It is reached if the input isn't one of numpy array, dataarray or elpddata, or if the elpddata for some reason doesn't store the khat data as a dataarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good now, we can merge once test pass. If you can it would be nice to add a line to the changelog too
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2349 +/- ##
==========================================
+ Coverage 86.95% 86.97% +0.01%
==========================================
Files 123 123
Lines 12722 12733 +11
==========================================
+ Hits 11063 11074 +11
Misses 1659 1659 ☔ View full report in Codecov by Sentry. |
The newest revision of the PSIS paper now recommends:
📚 Documentation preview 📚: https://arviz--2349.org.readthedocs.build/en/2349/