-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: optimize scheduling condition semantics #3741
feat: optimize scheduling condition semantics #3741
Conversation
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## master #3741 +/- ##
==========================================
- Coverage 56.61% 55.68% -0.94%
==========================================
Files 221 225 +4
Lines 20831 21335 +504
==========================================
+ Hits 11794 11880 +86
- Misses 8413 8822 +409
- Partials 624 633 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign
/assign |
11ebb19
to
247f0f0
Compare
247f0f0
to
8324ed5
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot~
/lgtm
Ask @Garrybest to help take a look.
/cc @Garrybest
Hi @whitewindmills, can you help update the release note more clearly, for example, add the new Reason name? It will be more intuitive when the version is released. |
sure |
/assign |
return result, &framework.FitError{ | ||
NumAllClusters: clusterInfoSnapshot.NumOfClusters(), | ||
Diagnosis: diagnosis, | ||
return result, &NoClusterError{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a new error type here.
pkg/scheduler/scheduler.go
Outdated
scheduleSuccessMessage = "Binding has been scheduled" | ||
scheduleSuccessReason = "BindingScheduled" | ||
scheduleFailedReason = "BindingFailedScheduling" | ||
noClusterAvailableReason = "NoClusterAvailable" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This two types is confusing, what's the difference?
I'd like to refer to what kubernetes does.
- We can't find any clusters that pass the filter plugins.
message: 0/1 clusters are available: 1 cluster(s) didn't match the placement cluster affinity constraint.
reason: Unschedulable
status: "False"
type: BindingScheduled
- we can't assign replicas due to not enough resources.
message: failed to assignReplicas: xxxxxx
reason: Unschedulable
status: "False"
type: BindingScheduled
This two types will cover the unschedulable condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoCluster
is for the scenarios without clusters. As discussed before, we need to distinguish between scenarios with and without clusters. For scenarios without clusters, retry scheduling actually does not make any sense, but from our requirements, we need to regard it as a successful scheduling. If we introduce Unschedulable
, how to define it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you enlighten me about what is without clusters
? Do you mean no clusters pass the filter plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, and no member cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scenarios without clusters, retry scheduling actually does not make any sense
What if a new cluster is added, or a taint is removed, should we retry?
I don't think this is a successful scheduling. For kubernetes, a pod is not scheduled successfully unless it is truely assigned to a node. We can easily distinguish no-cluster availablity and scheduler internal error by reason
. Unschedulable
is a typical reason for no more clusters fit or no more enough resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. But I guess no suitable cluster
is a Unschedulable one and internal error
is a SchedulerError
one, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course you are right. can we seperate the Unschedulable
reason into more precise ones? maybe we don't need Unschedulable
but seperate it into specific reasons, like NoClusterAvailable
, NoClusterFit
and InsufficientResources
etc. And the internal error is still SchedulerError
. how do you feel about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with it. But I'm not sure the difference between NoClusterAvailable and NoClusterFit. Can we merge them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, we can name it NoClusterFit
. but for NoClusterFit
, I think we do not need to return error to retry scheduling, cause it makes no sense and resources will be scheduled when new member cluster is joined or scheduling requirements are met.
Let's summarize, it looks like this:
- Unschedulable: no cluster fit
message: no clusters available to schedule
reason: NoClusterFit
status: "False"
type: Scheduled
- Unschedulable: insufficient resources
message: cluster xxx has no sufficient memory
reason: InsufficientResources
status: "False"
type: Scheduled
- Internal error: eg. can not access api-server...
message: xxx timeout
reason: SchedulerError
status: "False"
type: Scheduled
- schedule successfully
message: Binding has been scheduled
reason: BindingScheduled
status: "True"
type: Scheduled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Now we have reached an agreement.
8324ed5
to
532fc73
Compare
532fc73
to
6eb8f9a
Compare
Ask an again review from @Garrybest |
I slightly adjusted the results of our previous discussions. I think
|
Signed-off-by: whitewindmills <[email protected]>
6eb8f9a
to
45c995a
Compare
PTAL~ @Garrybest |
Good job. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Garrybest The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Good job! |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #3586
Special notes for your reviewer:
Does this PR introduce a user-facing change?: