Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry slurmd restarts on change #171

Merged
merged 1 commit into from
Nov 8, 2024
Merged

Retry slurmd restarts on change #171

merged 1 commit into from
Nov 8, 2024

Conversation

jovial
Copy link
Contributor

@jovial jovial commented Nov 8, 2024

There was a race conditions between slurmctld starting up and slurmd. This adds a few retries to make it more robust.

There was a race conditions between slurmctld starting up and slurmd.
This adds a few retries to make it more robust.
@jovial jovial requested a review from a team as a code owner November 8, 2024 14:52
Copy link
Collaborator

@sjpb sjpb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although we don't understand why this is reproducable on a small client dev cluster and no-where else. May be related to poor volume performance due to ceph traversing a router.

@sjpb sjpb merged commit b9f9d16 into master Nov 8, 2024
34 checks passed
@sjpb sjpb deleted the bugfix/slurm-retries branch November 8, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants