-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support accrual failure detection #346
Labels
Comments
accural -> accrual |
jhalterman
changed the title
Support phi accural failure detection
Support phi accrural failure detection
Sep 17, 2022
For some reason my fingers always struggle with that one :) |
😂 and it's still not right!
…On Sat, Sep 17, 2022, 6:18 PM Jonathan Halterman ***@***.***> wrote:
For some reason my fingers always struggle with that one :)
—
Reply to this email directly, view it on GitHub
<#346 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZ5SSDAYXVVYQMELURVX3V6Y7SJANCNFSM6AAAAAAQPFKLGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
jhalterman
changed the title
Support phi accrural failure detection
Support phi accrual failure detection
Sep 17, 2022
This is definitely a sign that the new policy should not be named PhiAccrual :) I like the idea of thinking about a new policy more generally, as something that measures a series of execution times, where phi accrual is maybe just one strategy for determining if those times represent a failure. |
jhalterman
changed the title
Support phi accrual failure detection
Support accrual failure detection
Sep 18, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As Failsafe already supports policies that are useful for networked operations, it would make sense to support phi accrural (or other accural algorithms) failure detection for situations where fixed timeouts don't adequately account for changing load conditions.
This could be implemented as a new policy which measures execution times over a number of executions, to determine if some threshold is crossed which represents a failure. Phi accrual could be one strategy supported by the policy, but there could be others. When the threshold is crossed, a fallback-like function could be called, for example, to fail over a system from one node that has failed to another. In that sense, the policy would be like a time-based fallback (rather than result based), except unlike a fallback it would be stateful.
Alternatively, this could be implemented as a Timeout option, where the timeout is stateful and adapts to execution time distributions.
One open question for this policy is, similar to a circuit breaker or rate limiter, at what point should it "reset" after triggering a failure, or should it even reset?
Any ideas for how this should work or what the policy should be named are welcome!
The text was updated successfully, but these errors were encountered: