-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error draining node #302
Comments
I manually
|
ok, I think I've figured out the chain of events to replicate the failure.
|
some potential solutions:
|
looks like in the latest code this has been partially addressed by https://github.com/keikoproj/upgrade-manager/blob/controller-v2/controllers/upgrade.go#L121 which works in conjunction to: https://github.com/keikoproj/upgrade-manager/blame/controller-v2/main.go#L93 to set a default |
setting drainTimeout to 0 also fails with the same "global timeout: -1s" - no idea why yet. worked around the issue by setting drainTimeout to maxInt32 (2147483647) |
that'll do it.... |
@shreyas-badiger |
I think it's more than the kubectl drain package you're using uses 0 for infinity instead of -1. Negative values are essentially "immediately timeout" |
Is this a BUG REPORT or FEATURE REQUEST?: Bug report
What happened: something very similar to #225
During a change to an instanceGroup, a RollingUpgrade is created. The RollingUpgrade correctly identifies the ec2 instances affected, attempts to drain the node and fails with an error. Deleting the RollingUpgrade and restarting the process consistently fails in the same way.
executing the same drain eg.
kubectl drain <node> --delete-local-data --ignore-daemonsets
works as expected from a local shell.What you expected to happen: kubectl drain evicts the pods as normal.
How to reproduce it (as minimally and precisely as possible): Not sure how to reproduce it outside of my environment, but seems to affect certain instance groups consistently.
Anything else we need to know?: the error appears to be produced immediately (about 1 second after the drain messages)
there are no PDBs being used on this cluster, all pods have a
termiantionGracePeriodSeconds=30
Environment:
Other debugging information (if applicable):
The text was updated successfully, but these errors were encountered: