-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASSETS-8997 add serviceoverload error reason #71
base: master
Are you sure you want to change the base?
Conversation
@@ -121,6 +122,14 @@ class RenditionTooLarge extends ClientError { | |||
} | |||
} | |||
|
|||
// Worker encountered upstream API rate limiting. Client may resubmit request after some time. | |||
class ServiceOverLoadError extends ClientError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a similar error in api-process already (look for TooManyRequestsError
). Could we merge those two classes into one instead, and refactor a bit where possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I based the ServiceOverLoad reason name on the design documents attached to the ASSETS-8997, though there are places where the error is listed as "TooManyRequests/ServiceOverLoad", and it wasn't clear if there was ambiguity over which error name to use throughout, or if it was intentional to have both. I personally see value in supporting both errors, since TooManyRequests is more readily associated with an HTTP 429 response originating from the Asset Compute Service itself, with the ability to provide a retry-after
directive for the client, while ServiceOverLoad would represent a more general error type that Asset Compute can throw asynchronously when it encounters throttling from upstream/3rd-party services (such as when a worker receives a 429 Too Many Requests HTTP response).
If the AEM client receives either error, the proper behavior is to retry the original after some time has passed, but with TooManyRequests, the client may be given an explicit Retry-After
, whereas with ServiceOverLoad it's basically
Retry-After: 🤷
I kind of had a hybrid approach in mind where we could support both of these error types in AEM for rendition_failed events just in case, and define both types in asset-compute-commons
, along with making the semantic distinction more clearly defined along the lines I described above. Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work, but I'm not sure if having two different errors names initially was intentional or not. @pheenomenon probably can clarify if the two different errors where intended or are just "synonyms" (talking about current design, not what we'll have eventually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use of "TooManyRequests/ServiceOverLoad" was not meant to be the same. It was used so to only express the idea.
I have seen, our downstream services could get overloaded for a variety of reasons and return 500 instead of 429. So I like the idea of keeping it flexible as ServiceOverload
instead of TooManyRequestsError
.
To the question if TooManyRequestsError
(we use in api-process for Nui throttling) should be converged to ServiceOverload
- we can take that route if we want, but that won't have an API dependency with AEM and won't bring a huge advantage. So hybrid approach sounds good to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case (which is also what confused me): Although 500 is generic, 503 should be ServiceOverload
then (503 usually means server is busy - but our services don't use it yet as far as I know). Otherwise it could be confusing for developers using our APIs: Why do they get an Overload error when there is a 500 (which could be anything, since it's generic)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Could we maybe still move the TooManyRequests exception here too, while at it, @adamcin? If it doesn't throw you off-track?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, with the note that ServiceOverload should be reserved for HTTP code 503, and not generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please additionally update the readme with the new error type and description:
https://github.com/adobe/asset-compute-commons#custom-errors
jira: https://jira.corp.adobe.com/browse/ASSETS-8997
downstream dependent @adobe/asset-compute-sdk PR#182
This change adds a new
rendition_failed
reason (ServiceOverLoad
) for use by asset compute workers that encounter upstream API rate limiting and need to indicate to downstream clients that a resubmission of the original asset compute request is necessary after some time has passed.Also defined is a
ServiceOverLoadErrorType
, which extendsClientError
rather thanGenericError
, because it is defined in the spirit of HTTP 4xx (429, specifically).