Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOM Killing Nomad Agent #681

Merged
merged 1 commit into from
Sep 5, 2024
Merged

Fix OOM Killing Nomad Agent #681

merged 1 commit into from
Sep 5, 2024

Conversation

mpass99
Copy link
Contributor

@mpass99 mpass99 commented Sep 5, 2024

by increasing the memory reservation per runner.

Closes #676

This may not be the most stable solution as different environments (images) might require different amounts of memory. However, with increasing the memory reservation per runner only as many runners are allocated on the agent as it could handle right now (with the given server resources, and the allocation definition).

by increasing the memory reservation per runner.
@mpass99 mpass99 requested a review from MrSerth September 5, 2024 12:30
@mpass99 mpass99 self-assigned this Sep 5, 2024
Copy link

codecov bot commented Sep 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.19%. Comparing base (e56b870) to head (16048c4).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #681   +/-   ##
=======================================
  Coverage   76.19%   76.19%           
=======================================
  Files          43       43           
  Lines        3592     3592           
=======================================
  Hits         2737     2737           
  Misses        625      625           
  Partials      230      230           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@MrSerth MrSerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the memory oversubscription enabled anyway. Hence, we are still way below the actual memory footprint required when all jobs on an agent execute student code submissions.

@mpass99
Copy link
Contributor Author

mpass99 commented Sep 5, 2024

That's true, but so we might still have the OOM Killing behavior and also still missing visibility on it, preventing detection of such Nomad agent overloadings.

@mpass99 mpass99 merged commit d83407e into main Sep 5, 2024
12 checks passed
@mpass99 mpass99 deleted the fix/#676-memory-limit branch September 5, 2024 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nomad Agents DoS on Migration
2 participants