You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice if in case of OOM events, Arcane operator could recreate the stream with increased resources.
Possible solution
Multiple job references
For each streaming job, add an annotation that describes which job template was used to construct the job
Replace single reference in jobTemplateRef and backfillJobTemplateRef with ordered array of references
If job fails with specific exception/exit code/etc, select the next job template from this array.
This approach is not backward-compatible, but can be used not only for OOM events, but for other errors, like automatic eviction from failing AZ to another AZ.
Scale Factor
Add to the SD a field scalingFactor.
If a job fails with OOM, multiply resources demands using that scalingFactor field
Easier to implement, but not so flexible as 1. Additionally, this approach does not require an increase in the number of job templates within the cluster.
Description
It would be nice if in case of OOM events, Arcane operator could recreate the stream with increased resources.
Possible solution
jobTemplateRef
andbackfillJobTemplateRef
with ordered array of referencesThis approach is not backward-compatible, but can be used not only for OOM events, but for other errors, like automatic eviction from failing AZ to another AZ.
scalingFactor
.scalingFactor
fieldEasier to implement, but not so flexible as 1. Additionally, this approach does not require an increase in the number of job templates within the cluster.
Alternatives
Use VerticalPodAutoscaler
Context
No response
The text was updated successfully, but these errors were encountered: