Support accrual failure detection #346

jhalterman · 2022-09-17T21:44:41Z

As Failsafe already supports policies that are useful for networked operations, it would make sense to support phi accrural (or other accural algorithms) failure detection for situations where fixed timeouts don't adequately account for changing load conditions.

This could be implemented as a new policy which measures execution times over a number of executions, to determine if some threshold is crossed which represents a failure. Phi accrual could be one strategy supported by the policy, but there could be others. When the threshold is crossed, a fallback-like function could be called, for example, to fail over a system from one node that has failed to another. In that sense, the policy would be like a time-based fallback (rather than result based), except unlike a fallback it would be stateful.

Alternatively, this could be implemented as a Timeout option, where the timeout is stateful and adapts to execution time distributions.

One open question for this policy is, similar to a circuit breaker or rate limiter, at what point should it "reset" after triggering a failure, or should it even reset?

Any ideas for how this should work or what the policy should be named are welcome!

Tembrel · 2022-09-17T22:15:04Z

accural -> accrual

jhalterman · 2022-09-17T22:18:01Z

For some reason my fingers always struggle with that one :)

Tembrel · 2022-09-17T23:25:08Z

😂 and it's still not right!

…

On Sat, Sep 17, 2022, 6:18 PM Jonathan Halterman ***@***.***> wrote: For some reason my fingers always struggle with that one :) — Reply to this email directly, view it on GitHub <#346 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABZ5SSDAYXVVYQMELURVX3V6Y7SJANCNFSM6AAAAAAQPFKLGI> . You are receiving this because you commented.Message ID: ***@***.***>

jhalterman · 2022-09-17T23:27:19Z

This is definitely a sign that the new policy should not be named PhiAccrual :) I like the idea of thinking about a new policy more generally, as something that measures a series of execution times, where phi accrual is maybe just one strategy for determining if those times represent a failure.

jhalterman added enhancement timeout labels Sep 17, 2022

jhalterman changed the title ~~Support phi accural failure detection~~ Support phi accrural failure detection Sep 17, 2022

jhalterman added new-policy and removed timeout labels Sep 17, 2022

jhalterman changed the title ~~Support phi accrural failure detection~~ Support phi accrual failure detection Sep 17, 2022

jhalterman changed the title ~~Support phi accrual failure detection~~ Support accrual failure detection Sep 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support accrual failure detection #346

Support accrual failure detection #346

jhalterman commented Sep 17, 2022 •

edited

Loading

Tembrel commented Sep 17, 2022

jhalterman commented Sep 17, 2022

Tembrel commented Sep 17, 2022 via email

jhalterman commented Sep 17, 2022 •

edited

Loading

Support accrual failure detection #346

Support accrual failure detection #346

Comments

jhalterman commented Sep 17, 2022 • edited Loading

Tembrel commented Sep 17, 2022

jhalterman commented Sep 17, 2022

Tembrel commented Sep 17, 2022 via email

jhalterman commented Sep 17, 2022 • edited Loading

jhalterman commented Sep 17, 2022 •

edited

Loading

jhalterman commented Sep 17, 2022 •

edited

Loading