Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respond to nodeConditions changes #1518

Open
nogazax opened this issue Sep 17, 2024 · 4 comments
Open

Respond to nodeConditions changes #1518

nogazax opened this issue Sep 17, 2024 · 4 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@nogazax
Copy link

nogazax commented Sep 17, 2024

Is your feature request related to a problem? Please describe.
When using nodeProblemDetector (NPD), there is a gap where NPD marks nodes using nodeCondition, and Descheduler (DSC) is watching taints.

Describe the solution you'd like
I want to fill this gap with a controller, which is alerted on nodeCondition changes.
Controller reads condition, if it matches a some criteria (implementation + restrictions TBD) it taints the node\deschedules the pods (customable taint\simply cordon TBD).

After tainting, DSC starts removing the pods from these nodes, and clusterAutoScaler will remove the node when underutilized.

Describe alternatives you've considered
non controller application - IMO, in this use case it's better subscribing to events rather than polling api server for them.
using forked version of Draino project - as an unmaintained project, with multiple security vulnerabilities, not the favorite option
add this functionality to NPD - stale PR for a long time

What version of descheduler are you using?

descheduler version:
V0.30.0

Additional context
I wrote a small POC.
NPD marks condition -> my controller recognized it and added taint to node -> DSC acts on pods without toleration

@nogazax nogazax added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 17, 2024
@a7i
Copy link
Contributor

a7i commented Sep 18, 2024

Would you be able to use the existing RemovePodsViolatingNodeTaints?

@nogazax
Copy link
Author

nogazax commented Sep 18, 2024

I think so, but not sure if this is the right place to add it:

  1. if yes, should it be renamed to RemovePodsViolatingNodeTaintsAndConditions?
  2. another option is to create another plugin RemovePodsViolatingNodeCondition. reducing complexity, adds functionality, but kinda duplicates the business logic
  3. last option is to add a small component which taints nodes based on conditions, and letting current controller do it's job.

WDYT?

@googs1025
Copy link
Member

Maybe RemovePodsViolatingCustomNodeConditions is more appropriate. 🤔 The built-in NodeConditions will be caught and evicted by the nodeLifeCycleController.

@googs1025
Copy link
Member

@a7i @ingvagabund any comment? 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants