feat: optimize scheduling condition semantics #3741

whitewindmills · 2023-06-30T09:44:46Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #3586

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-scheduler`: Introduced new schedule condition reasons, NoClusterFit, SchedulerError, Unschedulable, Success.

codecov-commenter · 2023-06-30T10:02:54Z

Codecov Report

Merging #3741 (6eb8f9a) into master (9bac51d) will decrease coverage by 0.94%.
The diff coverage is 23.52%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #3741      +/-   ##
==========================================
- Coverage   56.61%   55.68%   -0.94%     
==========================================
  Files         221      225       +4     
  Lines       20831    21335     +504     
==========================================
+ Hits        11794    11880      +86     
- Misses       8413     8822     +409     
- Partials      624      633       +9

Flag	Coverage Δ
unittests	`55.68% <23.52%> (-0.94%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pkg/scheduler/core/generic_scheduler.go	`0.00% <0.00%> (ø)`
pkg/scheduler/scheduler.go	`17.88% <0.00%> (-0.12%)`	⬇️
pkg/scheduler/core/division_algorithm.go	`85.71% <100.00%> (ø)`
pkg/scheduler/core/util.go	`75.40% <100.00%> (+0.83%)`	⬆️
pkg/scheduler/helper.go	`93.54% <100.00%> (+1.71%)`	⬆️

... and 9 files with indirect coverage changes

RainbowMango

/assign

pkg/scheduler/core/util.go

XiShanYongYe-Chang · 2023-07-05T09:52:21Z

/assign

whitewindmills · 2023-07-07T05:56:55Z

PTAL @XiShanYongYe-Chang @RainbowMango @jwcesign

jwcesign · 2023-07-07T06:28:16Z

LGTM

XiShanYongYe-Chang

Thanks a lot~
/lgtm

Ask @Garrybest to help take a look.
/cc @Garrybest

XiShanYongYe-Chang · 2023-07-07T06:58:16Z

Hi @whitewindmills, can you help update the release note more clearly, for example, add the new Reason name? It will be more intuitive when the version is released.

whitewindmills · 2023-07-07T07:21:09Z

Hi @whitewindmills, can you help update the release note more clearly, for example, add the new Reason name? It will be more intuitive when the version is released.

sure

Garrybest · 2023-07-07T07:30:17Z

/assign

Garrybest · 2023-07-07T07:52:07Z

pkg/scheduler/core/generic_scheduler.go

-		return result, &framework.FitError{
-			NumAllClusters: clusterInfoSnapshot.NumOfClusters(),
-			Diagnosis:      diagnosis,
+		return result, &NoClusterError{


I don't think we need a new error type here.

Garrybest · 2023-07-07T08:01:37Z

pkg/scheduler/scheduler.go

-	scheduleSuccessMessage = "Binding has been scheduled"
+	scheduleSuccessReason    = "BindingScheduled"
+	scheduleFailedReason     = "BindingFailedScheduling"
+	noClusterAvailableReason = "NoClusterAvailable"


This two types is confusing, what's the difference?

I'd like to refer to what kubernetes does.

We can't find any clusters that pass the filter plugins.

message: 0/1 clusters are available: 1 cluster(s) didn't match the placement cluster affinity constraint. reason: Unschedulable status: "False" type: BindingScheduled

we can't assign replicas due to not enough resources.

message: failed to assignReplicas: xxxxxx reason: Unschedulable status: "False" type: BindingScheduled

This two types will cover the unschedulable condition.

NoCluster is for the scenarios without clusters. As discussed before, we need to distinguish between scenarios with and without clusters. For scenarios without clusters, retry scheduling actually does not make any sense, but from our requirements, we need to regard it as a successful scheduling. If we introduce Unschedulable, how to define it?

Can you enlighten me about what is without clusters? Do you mean no clusters pass the filter plugin?

yeah, and no member cluster

For scenarios without clusters, retry scheduling actually does not make any sense

What if a new cluster is added, or a taint is removed, should we retry?

I don't think this is a successful scheduling. For kubernetes, a pod is not scheduled successfully unless it is truely assigned to a node. We can easily distinguish no-cluster availablity and scheduler internal error by reason. Unschedulable is a typical reason for no more clusters fit or no more enough resources.

Got it. But I guess no suitable cluster is a Unschedulable one and internal error is a SchedulerError one, isn't it?

of course you are right. can we seperate the Unschedulable reason into more precise ones? maybe we don't need Unschedulable but seperate it into specific reasons, like NoClusterAvailable, NoClusterFit and InsufficientResources etc. And the internal error is still SchedulerError. how do you feel about this?

I'm ok with it. But I'm not sure the difference between NoClusterAvailable and NoClusterFit. Can we merge them?

sure, we can name it NoClusterFit. but for NoClusterFit, I think we do not need to return error to retry scheduling, cause it makes no sense and resources will be scheduled when new member cluster is joined or scheduling requirements are met.
Let's summarize, it looks like this:

Unschedulable: no cluster fit

message: no clusters available to schedule reason: NoClusterFit status: "False" type: Scheduled

Unschedulable: insufficient resources

message: cluster xxx has no sufficient memory reason: InsufficientResources status: "False" type: Scheduled

Internal error: eg. can not access api-server...

message: xxx timeout reason: SchedulerError status: "False" type: Scheduled

schedule successfully

message: Binding has been scheduled reason: BindingScheduled status: "True" type: Scheduled

Great! Now we have reached an agreement.

XiShanYongYe-Chang · 2023-07-17T08:22:25Z

Ask an again review from @Garrybest

whitewindmills · 2023-07-17T08:40:38Z

I slightly adjusted the results of our previous discussions. I think NoClusterFit should be separated from Unschedulable due to its particularity. For NoClusterFit, we do not need to return error to retry scheduling. Now it looks like the following.

    message: Binding has been scheduled successfully.
    reason: Success
    status: "True"
    type: Scheduled

    message: 0/3 clusters are available: 3 cluster(s) didn't match the placement
      cluster affinity constraint.
    reason: NoClusterFit
    status: "False"
    type: Scheduled

    message: Clusters available replicas 3 are not enough to schedule.
    reason: Unschedulable
    status: "False"
    type: Scheduled

    message: failed to select clusters.
    reason: SchedulerError
    status: "False"
    type: Scheduled

/cc @Garrybest @XiShanYongYe-Chang

pkg/apis/work/v1alpha2/binding_types.go

pkg/scheduler/scheduler.go

pkg/scheduler/core/util.go

pkg/scheduler/helper.go

pkg/scheduler/scheduler.go

Signed-off-by: whitewindmills <[email protected]>

whitewindmills · 2023-07-17T10:48:22Z

PTAL~ @Garrybest

Garrybest · 2023-07-18T02:43:27Z

Good job.

/lgtm
/approve

karmada-bot · 2023-07-18T02:43:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Garrybest

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Garrybest]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

XiShanYongYe-Chang · 2023-07-18T03:18:31Z

Good job!
Thanks~

RainbowMango reviewed Jul 3, 2023

View reviewed changes

karmada-bot assigned RainbowMango Jul 3, 2023

jwcesign reviewed Jul 4, 2023

View reviewed changes

pkg/scheduler/core/util.go Outdated Show resolved Hide resolved

karmada-bot assigned XiShanYongYe-Chang Jul 5, 2023

whitewindmills force-pushed the schedule-condition branch from 11ebb19 to 247f0f0 Compare July 6, 2023 10:15

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 6, 2023

whitewindmills force-pushed the schedule-condition branch from 247f0f0 to 8324ed5 Compare July 7, 2023 03:42

XiShanYongYe-Chang reviewed Jul 7, 2023

View reviewed changes

karmada-bot requested a review from Garrybest July 7, 2023 06:56

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2023

karmada-bot assigned Garrybest Jul 7, 2023

Garrybest reviewed Jul 7, 2023

View reviewed changes

whitewindmills force-pushed the schedule-condition branch from 8324ed5 to 532fc73 Compare July 17, 2023 06:56

karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 17, 2023

whitewindmills force-pushed the schedule-condition branch from 532fc73 to 6eb8f9a Compare July 17, 2023 07:35

karmada-bot requested review from Garrybest and XiShanYongYe-Chang July 17, 2023 08:40

Garrybest reviewed Jul 17, 2023

View reviewed changes

add new schedule condition reason

45c995a

Signed-off-by: whitewindmills <[email protected]>

whitewindmills force-pushed the schedule-condition branch from 6eb8f9a to 45c995a Compare July 17, 2023 10:42

whitewindmills changed the title ~~feat: add no cluster schedule condition reason~~ feat: optimize scheduling condition semantics Jul 17, 2023

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 18, 2023

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 18, 2023

karmada-bot merged commit 71de164 into karmada-io:master Jul 18, 2023
11 checks passed

whitewindmills deleted the schedule-condition branch July 18, 2023 02:48

lxtywypc mentioned this pull request Nov 20, 2023

always update scheduler observed generation when scheduling result being patched successfully #4251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimize scheduling condition semantics #3741

feat: optimize scheduling condition semantics #3741

whitewindmills commented Jun 30, 2023 •

edited by RainbowMango

Loading

codecov-commenter commented Jun 30, 2023 •

edited

Loading

RainbowMango left a comment

XiShanYongYe-Chang commented Jul 5, 2023

whitewindmills commented Jul 7, 2023

jwcesign commented Jul 7, 2023

XiShanYongYe-Chang left a comment

XiShanYongYe-Chang commented Jul 7, 2023

whitewindmills commented Jul 7, 2023

Garrybest commented Jul 7, 2023

Garrybest Jul 7, 2023

Garrybest Jul 7, 2023

whitewindmills Jul 7, 2023

Garrybest Jul 7, 2023

whitewindmills Jul 7, 2023

Garrybest Jul 7, 2023

Garrybest Jul 10, 2023

whitewindmills Jul 10, 2023

Garrybest Jul 10, 2023

whitewindmills Jul 10, 2023

Garrybest Jul 10, 2023

XiShanYongYe-Chang commented Jul 17, 2023

whitewindmills commented Jul 17, 2023

whitewindmills commented Jul 17, 2023

Garrybest commented Jul 18, 2023

karmada-bot commented Jul 18, 2023

XiShanYongYe-Chang commented Jul 18, 2023

feat: optimize scheduling condition semantics #3741

feat: optimize scheduling condition semantics #3741

Conversation

whitewindmills commented Jun 30, 2023 • edited by RainbowMango Loading

codecov-commenter commented Jun 30, 2023 • edited Loading

Codecov Report

RainbowMango left a comment

Choose a reason for hiding this comment

XiShanYongYe-Chang commented Jul 5, 2023

whitewindmills commented Jul 7, 2023

jwcesign commented Jul 7, 2023

XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

XiShanYongYe-Chang commented Jul 7, 2023

whitewindmills commented Jul 7, 2023

Garrybest commented Jul 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiShanYongYe-Chang commented Jul 17, 2023

whitewindmills commented Jul 17, 2023

whitewindmills commented Jul 17, 2023

Garrybest commented Jul 18, 2023

karmada-bot commented Jul 18, 2023

XiShanYongYe-Chang commented Jul 18, 2023

whitewindmills commented Jun 30, 2023 •

edited by RainbowMango

Loading

codecov-commenter commented Jun 30, 2023 •

edited

Loading