-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: retry with exponential backoff for DynamoDb interaction #1975
feat: retry with exponential backoff for DynamoDb interaction #1975
Conversation
I decided against using the existing retry logic from
@rtyler: We're currently only retrying
In theory, this indicates that the error is also transient, but I'm not an AWS expert enough to understand if it's sensible to handle this as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pulling the backoff crate into deltalake-aws is reasonable. The nice thing about the subcrates is that this new dependency only affects AWS users 😄
crates/deltalake-aws/src/lib.rs
Outdated
.with_max_interval(Duration::from_secs(15)) | ||
.with_max_elapsed_time(Some(Duration::from_secs(60))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max elapsed interval of 60 I am assuming is just a number that you've picked?
I think this is something that will likely need to be configurable since different systems would have a different tolerance here. Based on how retry
is being invoked I don't see a clear path to pulling configuration down into this call 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these numbers would ideally be configurable. It seems other aspects of delta-rs
, including the locking provider choice itself, are being configured via environment variables, so I guess that's the way to go.
My question would mostly be if we want all five of possible parameters to be configurable:
- max elapsed time
- max interval
- multiplier
- initial interval (using defaults in this PR)
- randomization factor (using defaults in this PR)
The most obvious one is probably the one you singled out, max elapsed time, but there's probably a user out there for any of these :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dispanser I agree there are folks likely that would want to tune every single one of these...maybe.
I think making an environment variable for max elapsed time is the only one really important to implement to merge this. The others people may want, or not, but they should make themselves known in the future 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an environment variable for the max elapsed time setting.
I'd address the diverging docs in a separate PR, they are already way off anyways :)
d478153
to
3b4d6d4
Compare
3b4d6d4
to
fa0bc7d
Compare
I think the failing test (macos, |
82eeaca
to
760b205
Compare
@dispanser please rebase this on the latest main with the mega refactor 😄 |
760b205
to
251dded
Compare
We use an external crate, [backoff](https://crates.io/crates/backoff), to retry DynamoDb read and write operations when the error in the response is `ProvisionedThroughPutExceeded`, indicating an overload of DynamoDb wrt to the configured read and write capacity.
251dded
to
caa6fb8
Compare
@rtyler rebase is done, all tests pass locally. |
caa6fb8
to
21338cd
Compare
Description
We use an external crate, backoff, to retry DynamoDb read and write operations when the error in the response is
ProvisionedThroughPutExceeded
, indicating an overload of DynamoDb wrt to the configured read and write capacity.