-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimized simultaneous ECDF bands #2368
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2368 +/- ##
==========================================
- Coverage 87.01% 86.99% -0.03%
==========================================
Files 124 124
Lines 12788 12862 +74
==========================================
+ Hits 11128 11189 +61
- Misses 1660 1673 +13 ☔ View full report in Codecov by Sentry. |
return prob_right | ||
|
||
|
||
def _ecdf_band_optimization_objective( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should ideally be jitted with numba if available for a likely significant speed-up. Currently the problem is that the functions it calls use scipy.stats.binom.interval
and scipy.stats.binom.pmf
, which are not JIT-able. If we reimplement these functions here, then we could probably get a significant speed-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interval
calls ppf
, which defers to Boost's implementation, which in turn uses TOMS Algorithm 748 to find a root in an interval: https://doi.org/10.1145/210089.210111, so reimplementing that to be JIT-able would be the bottleneck. Which seems like it could be a lot of work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah wait, since interval
is called outside the for loop and only pmf
is called within, this could be simple.
@conditional_jit
doesn't seem to like it if I call another conditionally-jitted function within the first conditionally-jitted function, e.g.
E numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E Untyped global name '_update_ecdf_band_interior_probabilities': Cannot determine Numba type of <class 'arviz.utils.maybe_numba_fn'>
Is there a way to make this work?
I worked out how to JIT compile with numba and definitely see large speed-ups for low
bayesplot defaults to interpolation for |
Yeah, that's the only context where this makes sense, which is the same context as PIT plots. Ergo, I'll leave interpolation for a separate PR, and in this PR I'll use a benchmark to heuristically identify when we should switch from optimized bands to simulated as the default in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good, not sure if you still need help with numba, if so it wasn't clear to me what wasn't working. Do you have more changes planned for ecdf functionality? I wanted to test porting it to arviz-stats
too
No, it works now and is tested.
Yes, mainly to make this efficient for multi-panel PIT plots. For those cases one wants to compute the band once and then add it to each subplot. And the band can be computed more efficiently in many cases by us fitting the band probability using optimization for a wide range of values and then interpolating. Efficiency boosts shouldn't block porting it. It's just that as I implement the changes, it may motivate some code restructuring, so maybe it's better to hold off on porting until that is finished? Later I will add the rank ECDF plots and corresponding bands, but that's strictly addition of new features and maybe should be added directly to arviz-stats. |
Description
This pull request adds the optimization method from https://doi.org/10.1007/s11222-022-10090-6 for computing simultaneous ECDF bands and
switches to using it as the default approachuses it as the default approach sometimes based on a heuristic.Part of #2309. The changes are non-breaking, since the docstring of
plot_ecdf
intentionally does not document which method is the default.Example
Note that while optimized confidence bands are more stable than simulated, with this implementation they are not necessarily faster. Perhaps that can be improved using Numba.
Checklist
PR format?
section of the changelog?
📚 Documentation preview 📚: https://arviz--2368.org.readthedocs.build/en/2368/