Use revised Pareto k threshold #2349

aloctavodia · 2024-05-27T14:07:17Z

The newest revision of the PSIS paper now recommends:

"good" ---------> k < min(1 - 1/log10(S), 0.7)
"bad" -----------> min(1 - 1/log10(S), 0.7) <= k < 1
"very bad" ---> k > 1

📚 Documentation preview 📚: https://arviz--2349.org.readthedocs.build/en/2349/

arviz/plots/backends/bokeh/khatplot.py

arviz/plots/khatplot.py

aloctavodia · 2024-05-28T17:01:50Z

"good_k" value is computed once and stored in the ELPDData. I am open to suggestions for a better name.

OriolAbril · 2024-06-05T09:53:36Z

arviz/plots/khatplot.py

        if isinstance(khats, ELPDData):
+            good_k = khats.good_k
            khats = khats.pareto_k


if isinstance(khats, ELPDData): good_k = khats.good_k khats = khats.pareto_k else: good_k = None warnings.warn()

This should be something like this instead. Right now, dataarrays are also valid input as they are array-like, but we have more info than we do for numpy arrays, so they are treated more similarly to elpddata input. I think this is a reason for some of the test failures.

Note: also rebase on main to avoid unrelated failures that have been fixed already.

OriolAbril · 2024-06-05T09:58:31Z

arviz/stats/stats.py

-            "importance sampling is less likely to work well if the marginal posterior and "
-            "LOO posterior are very different. This is more likely to happen with a non-robust "
-            "model and highly influential observations."
+            f"Estimated shape parameter of Pareto distribution is greater than {good_k:.1f} "


I'd probably show 2 digits at least, otherwise it might be confusing to people when checking the plot so see why 0.68 is counted as bad if the printed value is 0.7 (but the good_k value is actually 0.66). The plot generally allows for more than one digit precision lookouts, all plots in https://python.arviz.org/en/stable/api/generated/arviz.plot_khat.html for example do.

OriolAbril · 2024-06-05T09:59:07Z

arviz/stats/stats_utils.py

- (0.5, 0.7]   (ok)       {{3:{0}d}} {{7:6.1f}}%
-   (0.7, 1]   (bad)      {{4:{0}d}} {{8:6.1f}}%
-   (1, Inf)   (very bad) {{5:{0}d}} {{9:6.1f}}%
+(-Inf, {{8:.1f}}]   (good)     {{2:{0}d}} {{5:6.1f}}%


same comment here

OriolAbril · 2024-06-05T16:39:22Z

arviz/plots/khatplot.py

            khats = khats.pareto_k
        if not isinstance(khats, DataArray):
-            raise ValueError("Incorrect khat data input. Check the documentation")


The valueerror is a different if altogether, not the else branch of the one above. It is reached if the input isn't one of numpy array, dataarray or elpddata, or if the elpddata for some reason doesn't store the khat data as a dataarray

OriolAbril

I think it's good now, we can merge once test pass. If you can it would be nice to add a line to the changelog too

codecov · 2024-06-05T17:25:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.97%. Comparing base (eda7f38) to head (3d11015).
Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2349      +/-   ##
==========================================
+ Coverage   86.95%   86.97%   +0.01%     
==========================================
  Files         123      123              
  Lines       12722    12733      +11     
==========================================
+ Hits        11063    11074      +11     
  Misses       1659     1659

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

OriolAbril reviewed May 28, 2024

View reviewed changes

arviz/plots/backends/bokeh/khatplot.py Outdated Show resolved Hide resolved

arviz/plots/khatplot.py Outdated Show resolved Hide resolved

OriolAbril reviewed Jun 5, 2024

View reviewed changes

aloctavodia added 3 commits June 5, 2024 10:40

use revised Pareto k threshold

aee1c0d

avoid duplicated computations

5cb460e

fix per comments

ada4736

aloctavodia force-pushed the good_k branch from f7dc5ac to ada4736 Compare June 5, 2024 13:41

OriolAbril reviewed Jun 5, 2024

View reviewed changes

fix ValueError and warning

42a69ec

OriolAbril approved these changes Jun 5, 2024

View reviewed changes

update changelog

3d11015

aloctavodia merged commit 3a454f7 into main Jun 5, 2024
12 checks passed

aloctavodia deleted the good_k branch June 5, 2024 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use revised Pareto k threshold #2349

Use revised Pareto k threshold #2349

aloctavodia commented May 27, 2024 •

edited by github-actions bot

Loading

aloctavodia commented May 28, 2024

OriolAbril Jun 5, 2024

OriolAbril Jun 5, 2024

OriolAbril Jun 5, 2024

OriolAbril Jun 5, 2024

OriolAbril left a comment

codecov bot commented Jun 5, 2024 •

edited

Loading

Use revised Pareto k threshold #2349

Use revised Pareto k threshold #2349

Conversation

aloctavodia commented May 27, 2024 • edited by github-actions bot Loading

aloctavodia commented May 28, 2024

OriolAbril Jun 5, 2024

Choose a reason for hiding this comment

OriolAbril Jun 5, 2024

Choose a reason for hiding this comment

OriolAbril Jun 5, 2024

Choose a reason for hiding this comment

OriolAbril Jun 5, 2024

Choose a reason for hiding this comment

OriolAbril left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 5, 2024 • edited Loading

Codecov Report

aloctavodia commented May 27, 2024 •

edited by github-actions bot

Loading

codecov bot commented Jun 5, 2024 •

edited

Loading