You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that Poseidon handles these cases correctly:
For runner jobs, Poseidon removes these jobs from the idle/used runners. After some time, the Prewarming Pool Alert might recreate the runner jobs.
For the environment jobs, Poseidon logs Environment stopped unexpectedly. If we have gained some more insights into this behavior in the future, we might introduce some recovery techniques once an unexpectedly stopped environment has been detected.
Of course, it would be the best that no runners become dead on restarts. However, in #673 we already identified a local and an upstream issue that should be fixed to improve this scenario.
We finished our investigation and did not find any issue in Poseidon. Still, this issue will benefit from potential improvements made as part of #673. Nevertheless, since there is no work left for this issue, we are closing it.
In #612 we noticed that on a simultaneous restart of all Nomad agents, some jobs are completely gone despite its
restart
andrescheduling
policies.nomadEventDump-RestartTogether-withoutAlert.txt
The text was updated successfully, but these errors were encountered: