Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check Event Handling on Nomad Agent Simultaneous Restart #674

Closed
mpass99 opened this issue Sep 4, 2024 · 2 comments
Closed

Check Event Handling on Nomad Agent Simultaneous Restart #674

mpass99 opened this issue Sep 4, 2024 · 2 comments
Labels
question Further information is requested

Comments

@mpass99
Copy link
Contributor

mpass99 commented Sep 4, 2024

In #612 we noticed that on a simultaneous restart of all Nomad agents, some jobs are completely gone despite its restart and rescheduling policies.

  • Check if Poseidon handles such cases right

nomadEventDump-RestartTogether-withoutAlert.txt

@mpass99 mpass99 added the question Further information is requested label Sep 4, 2024
@mpass99
Copy link
Contributor Author

mpass99 commented Sep 5, 2024

It appears that Poseidon handles these cases correctly:

For runner jobs, Poseidon removes these jobs from the idle/used runners. After some time, the Prewarming Pool Alert might recreate the runner jobs.
For the environment jobs, Poseidon logs Environment stopped unexpectedly. If we have gained some more insights into this behavior in the future, we might introduce some recovery techniques once an unexpectedly stopped environment has been detected.

Of course, it would be the best that no runners become dead on restarts. However, in #673 we already identified a local and an upstream issue that should be fixed to improve this scenario.

@MrSerth
Copy link
Member

MrSerth commented Sep 25, 2024

We finished our investigation and did not find any issue in Poseidon. Still, this issue will benefit from potential improvements made as part of #673. Nevertheless, since there is no work left for this issue, we are closing it.

@MrSerth MrSerth closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants