Use actual node resource utilization in the strategy "LowNodeUtilization" #225

zhiyxu · 2020-02-03T02:42:03Z

Currently, pods' request resource requirements are considered for computing node resource utilization in the strategy "LowNodeUtilization", is it more rational to use actual node resource utilization as judgement?

It is common that resource limit of pod is larger than request, and after the scheduling of default scheduler (by resource request), It's probably that cluster is balanced. But the actual resources usage of pod could be much larger than request, which may lead some nodes under pressure.

So is it much more reasonable to combine with metrics server and use actual node resource utilization.

seanmalloy · 2020-02-03T04:19:19Z

/kind feature

seanmalloy · 2020-02-03T04:54:03Z

@zhiyxu it looks like you are not the first person to request this feature. See the discussions in #123 , #118, and #7. Based on the discussions in those issues it looks like the descheduler LowNodeUtilization strategy is still using requests because this aligns with how the k8s scheduler works. Also, this feature is mentioned in the roadmap.

@damemi @aveshagarwal @ravisantoshgudimetla has anything changed recently to enable the k8s scheduler to use real load metrics during scheduling? For example could the new scheduler framework some how enable this feature in the scheduler? Maybe a custom plugin using the scheduler framework could be created to take real load metrics into account?

zhiyxu · 2020-02-14T07:56:37Z

@seanmalloy @ravisantoshgudimetla @damemi @aveshagarwal Any update or plan about this feature?

kangtiann · 2020-02-25T01:49:37Z

+1, We need this feature too.

If we can make a PR for this ?

@seanmalloy @ravisantoshgudimetla @damemi @aveshagarwal

seanmalloy · 2020-02-25T05:00:25Z

@zhiyxu and @kangtiann here are my initial thoughts on what the API spec might look like. Please let me know what you think. I'm pretty confident the v1alpha2 LowNodeUtilization strategy will need to be adjusted.

I believe it would be a good idea to write a proposal for this and have SIG scheduling review it.

Create a new v1alpha1 LowNodeAllocation strategy. This strategy will work identically to the request based v1alpha LowNodeUtilization strategy. The usage of the work allocation is inspired by the discussion in #7.

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "LowNodeAllocation":
     enabled: true
     params:
       nodeResourceAllocatedThresholds:
         thresholds:
           "cpu" : 20
           "memory": 20
           "pods": 20
         targetThresholds:
           "cpu" : 50
           "memory": 50
           "pods": 50

Create a new v1alpha2 LowNodeUtilization strategy. This strategy will get data from the metrics API to evict pods from nodes based on actual node utilization metrics. The below proposed YAML API spec is a rough draft and will need to be refined.

The HPA supports custom metrics. Does the descheduler also need to support custom metrics too?

Keep in mind that the k8s scheduler does not take actual node utilization into account when scheduling pods. Pods evicted by this strategy could end up being scheduled on the same node again. Maybe this strategy could be paired with a yet to be created out of tree scheduler plugin that takes node utilization into account when scheduling pods. See the discussions in #123 and #118.

apiVersion: "descheduler/v1alpha2".   # Bump to v1alpha2
kind: "DeschedulerPolicy"
strategies:
  "LowNodeUtilization":
     enabled: true
     params:
       nodeResourceUtilizationThresholds:
         thresholds:
           "cpu" : 20
           "memory": 20
         targetThresholds:
           "cpu" : 50
           "memory": 50

seanmalloy · 2020-02-25T05:02:10Z

If we can make a PR for this ?

@kangtiann just want to clarify are you willing to implement this and submit a PR with the required code changes?

seanmalloy · 2020-02-25T05:25:47Z

Also, keep in mind that the kubelet will evict pods when a node starts running out of memory or disk, https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#eviction-signals.

damemi · 2020-02-25T21:10:39Z

Would definitely like to get more feedback from the scheduling sig on the feasability of this. Getting "actual" pod usage has been a tricky problem I've personally hit trying to debug flaking e2es and I'm not totally caught up on the current state of getting that info.

However I like @seanmalloy's proposal because we already have this strategy that uses resource requests, which users may desire/prefer/expect, but I don't think it would require an entirely new strategy. I think a simple boolean on the current strategy to flip between spec resources and "actual" resources would be less confusing in code and usage

zhiyxu · 2020-02-28T07:24:37Z

@seanmalloy The proposal is great, and there are some further details to consider:

Create a v1alpha1 version of the LowNodeAllocation strategy to replace LowNodeUtilization is a good idea, but it'll result in two resource types in v1alpha1 version have exactly same effect, which will be a little bit confusing. Is it better to create these 2 resource types directly in v1alpha2, of course, LowNodeUtilization would be no longer backward compatible anyway.
@damemi Is it possible that customers want to use both LowNodeAllocation and LowNodeUtilization strategies simultaneously? Maybe these two strategies are not completely opposite.
In order to make LowNodeUtilization take effect in v1alpha2, we need to build a scheduler framework plugin, which contains policy takes realtime node utilization into account. Meanwhile the customer needs to run a scheduler including the plugin as pod in the cluster and change the spec.schedulerName of specific Pods to the name of the scheduler, or more directly, the customer could replace the original kube-scheduler with the new scheduler containing the plugin, to effect on all Pods in the cluster. Whichever way would greatly increase the difficulty of using the project.
If the Metrics API is not installed in the customer's cluster, the realtime node metrics can't be gathered, and this strategy will be completely useless.
Custom extended resources are issues we need to consider further.

fejta-bot · 2020-05-28T08:11:53Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

seanmalloy · 2020-05-29T05:22:34Z

/remove-lifecycle stale

fejta-bot · 2020-08-27T05:34:28Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

seanmalloy · 2020-08-27T16:23:53Z

There was a recent proposal at the SIG Scheduling meeting to add a scheduler plugin to take real load metrics into account during scheduling.

https://docs.google.com/presentation/d/13tleXxfPHRnW_-desRTzOZwpRDJX5u4MPlnQxNs15IU/edit#slide=id.g8fcfb6bb75_2_14

/remove-lifecycle stale

seanmalloy · 2020-09-17T05:09:15Z

Here is the KEP document for Real Load Aware Scheduling: https://docs.google.com/document/d/1ffBpzhqELmhqJxdGMzYzIOoigxn3J0zlP1_nie34f9s/edit#

seanmalloy · 2020-09-26T04:18:30Z

Updated KEP document for Real Load Aware Scheduling:
kubernetes-sigs/scheduler-plugins#61

pgiles · 2020-10-02T19:51:18Z

After evaluating Descheduler, we are very hopeful it will help us rebalance our clusters. However, we cannot move forward until this feature is implemented. In short, +1 for this feature request and we'll check back often to see when it is released. Thank you!

fejta-bot · 2020-12-31T20:00:52Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

damemi · 2021-01-04T15:00:36Z

/remove-lifecycle stale

stefkkkk · 2021-01-22T17:06:40Z

any updates about it, please? would be very useful feature

damemi · 2021-01-22T17:19:28Z

@Stefik95 the linked enhancements around real load aware scheduling are still being worked on (mainly in the scheduler-plugins repo, under the "Trimaran" name).

It was mentioned above, but getting actual pod consumption relies on access to the metrics api. To move forward with this, we should look into what we need to be able to access those metrics from within descheduler (and fallbacks/disable when those metrics aren't available). Any help with this step is welcome, it would likely follow a similar pattern to Trimaran's metrics collection.

As a side note, there were also metrics recently added to report the scheduler's "observed" usage based on limits/requests for administrators to compare to real usage (kubernetes/enhancements#1916 and kubernetes/kubernetes#94866). This is intended to help admins optimize their requests and limits to better reflect actual values.

stefkkkk · 2021-01-22T17:27:20Z

@Stefik95 the linked enhancements around real load aware scheduling are still being worked on (mainly in the scheduler-plugins repo, under the "Trimaran" name).

It was mentioned above, but getting actual pod consumption relies on access to the metrics api. To move forward with this, we should look into what we need to be able to access those metrics from within descheduler (and fallbacks/disable when those metrics aren't available). Any help with this step is welcome, it would likely follow a similar pattern to Trimaran's metrics collection.

As a side note, there were also metrics recently added to report the scheduler's "observed" usage based on limits/requests for administrators to compare to real usage (kubernetes/enhancements#1916 and kubernetes/kubernetes#94866). This is intended to help admins optimize their requests and limits to better reflect actual values.

Thanks for answer! Could you tell me please, is it true, or not. At the moment LowNodeUtilization works on requests which were set during pod's deploy, not along time changes of pod's requests?

damemi · 2022-01-27T16:43:53Z

Pinning this issue as it is a common request

See also #225, #437, #270, #118, #90, #702

k8s-triage-robot · 2022-04-30T17:49:11Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

damemi · 2022-05-02T11:45:58Z

/remove-lifecycle stale

k8s-triage-robot · 2022-07-31T12:38:15Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

metost · 2022-07-31T15:10:40Z

/remove-lifecycle stale

k8s-triage-robot · 2022-10-29T15:32:01Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

robertchgo · 2022-11-04T12:53:28Z

this would be a really useful feature for us are there any updates on this?

/remove-lifecycle stale

damemi · 2022-11-04T14:15:59Z

@robertchgo not at the moment. there have been a few who have offered to implement it as discussed above but no progress so far. with other ongoing work, this is a backlog feature right now

/lifecycle frozen

binacs · 2023-03-11T02:58:13Z

Hello everyone! I have a MR(#1087) to try to solve this problem, and I look forward to everyone's review comments to make it better.

Hope it will help.

joenzx · 2023-09-14T07:15:51Z

This feature is really useful, when will it be updated?

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 3, 2020

seanmalloy mentioned this issue Apr 23, 2020

kubectl and descheduler CPU utilisation values are vastly different #270

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 28, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 29, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 27, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 27, 2020

seanmalloy mentioned this issue Nov 12, 2020

Why not use the true utilization of node resources to evaluate and schedule pod #437

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2021

damemi pinned this issue Jan 27, 2022

damemi mentioned this issue Jan 27, 2022

Number of underutilized nodes is inaccurate #702

Closed

damemi mentioned this issue Feb 17, 2022

cpu utilization does not include st percentage #730

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 2, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2022

damemi mentioned this issue Aug 10, 2022

LowNodeUtilization - the resources usage does not match the actual values #909

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 29, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 4, 2022

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 4, 2022

damemi mentioned this issue Jan 10, 2023

Is there a plan to support the strategy of rescheduling according to the real load of the node? #1039

Closed

binacs mentioned this issue Mar 11, 2023

feat: support TargetLoadPacking strategy #1087

Open

damemi mentioned this issue Mar 15, 2023

real utilzation descheduler #1092

Closed

tiraboschi mentioned this issue Sep 24, 2024

design-proposal: VirtualMachineInstanceMigration - Live migration to a named node kubevirt/community#320

Open

8 tasks

This was referenced Oct 11, 2024

Node utilization refactoring #1532

Merged

[lownodeutilization]: Actual utilization: integration with Prometheus #1533

Open

ingvagabund self-assigned this Nov 15, 2024

ingvagabund mentioned this issue Nov 15, 2024

Use actual node resource utilization by consuming kubernetes metrics #1555

Merged

k8s-ci-robot closed this as completed in #1555 Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use actual node resource utilization in the strategy "LowNodeUtilization" #225

Use actual node resource utilization in the strategy "LowNodeUtilization" #225

zhiyxu commented Feb 3, 2020

seanmalloy commented Feb 3, 2020

seanmalloy commented Feb 3, 2020 •

edited

Loading

zhiyxu commented Feb 14, 2020

kangtiann commented Feb 25, 2020

seanmalloy commented Feb 25, 2020 •

edited

Loading

seanmalloy commented Feb 25, 2020

seanmalloy commented Feb 25, 2020

damemi commented Feb 25, 2020

zhiyxu commented Feb 28, 2020

fejta-bot commented May 28, 2020

seanmalloy commented May 29, 2020

fejta-bot commented Aug 27, 2020

seanmalloy commented Aug 27, 2020 •

edited

Loading

seanmalloy commented Sep 17, 2020

seanmalloy commented Sep 26, 2020

pgiles commented Oct 2, 2020

fejta-bot commented Dec 31, 2020

damemi commented Jan 4, 2021

stefkkkk commented Jan 22, 2021 •

edited

Loading

damemi commented Jan 22, 2021

stefkkkk commented Jan 22, 2021 •

edited

Loading

damemi commented Jan 27, 2022 •

edited

Loading

k8s-triage-robot commented Apr 30, 2022

damemi commented May 2, 2022

k8s-triage-robot commented Jul 31, 2022

metost commented Jul 31, 2022

k8s-triage-robot commented Oct 29, 2022

robertchgo commented Nov 4, 2022

damemi commented Nov 4, 2022

binacs commented Mar 11, 2023

joenzx commented Sep 14, 2023

Use actual node resource utilization in the strategy "LowNodeUtilization" #225

Use actual node resource utilization in the strategy "LowNodeUtilization" #225

Comments

zhiyxu commented Feb 3, 2020

seanmalloy commented Feb 3, 2020

seanmalloy commented Feb 3, 2020 • edited Loading

zhiyxu commented Feb 14, 2020

kangtiann commented Feb 25, 2020

seanmalloy commented Feb 25, 2020 • edited Loading

seanmalloy commented Feb 25, 2020

seanmalloy commented Feb 25, 2020

damemi commented Feb 25, 2020

zhiyxu commented Feb 28, 2020

fejta-bot commented May 28, 2020

seanmalloy commented May 29, 2020

fejta-bot commented Aug 27, 2020

seanmalloy commented Aug 27, 2020 • edited Loading

seanmalloy commented Sep 17, 2020

seanmalloy commented Sep 26, 2020

pgiles commented Oct 2, 2020

fejta-bot commented Dec 31, 2020

damemi commented Jan 4, 2021

stefkkkk commented Jan 22, 2021 • edited Loading

damemi commented Jan 22, 2021

stefkkkk commented Jan 22, 2021 • edited Loading

damemi commented Jan 27, 2022 • edited Loading

k8s-triage-robot commented Apr 30, 2022

damemi commented May 2, 2022

k8s-triage-robot commented Jul 31, 2022

metost commented Jul 31, 2022

k8s-triage-robot commented Oct 29, 2022

robertchgo commented Nov 4, 2022

damemi commented Nov 4, 2022

binacs commented Mar 11, 2023

joenzx commented Sep 14, 2023

seanmalloy commented Feb 3, 2020 •

edited

Loading

seanmalloy commented Feb 25, 2020 •

edited

Loading

seanmalloy commented Aug 27, 2020 •

edited

Loading

stefkkkk commented Jan 22, 2021 •

edited

Loading

stefkkkk commented Jan 22, 2021 •

edited

Loading

damemi commented Jan 27, 2022 •

edited

Loading