Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

[scheduler, mtsource] Even Spread across Nodes in each zone for HA scheduler #707

Closed
wants to merge 2 commits into from

Conversation

aavarghese
Copy link
Contributor

Fixes #593
🎁 Implementing HA across nodes within a zone as a new scheduling strategy

        - name: SCHEDULER_POLICY_TYPE
          value: 'EVENSPREAD_BYNODE'

Continuation of #587

Proposed Changes

  • Controller has another new env var to control "scheduler policy type" called 'EVENSPREAD_BYNODE',
  • "scheduler policy type" is an enum with three values MaxFillup, EvenSpread and EvenSpreadByNode. For EvenSpreadByNode, scheduler places vreplicas for HA by spreading replicas equally across nodes within a zone. Within a node, vreplicas are filled up in each pod upto capacity.
  • Placement type in KafkaSource status has a new type for NodeName when EvenSpreadByNode scheduling
  • HA controls are also shared with autoscaler, and state accessor
  • State accessor computes the number of nodes in cluster and stores a map of node name to zone info

Release Note

New SCHEDULER_POLICY_TYPE called 'EVENSPREAD_BYNODE' for a High Availability scheduler that uniformly distributes vreplicas across all nodes in each zone to reduce impact of failure when a node or zone fails.

@knative-prow-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@knative-prow-robot knative-prow-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 14, 2021
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Jun 14, 2021
@aavarghese aavarghese changed the title Even Spread across Nodes in each zone for HA scheduler [WIP] Even Spread across Nodes in each zone for HA scheduler Jun 14, 2021
@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 14, 2021
@codecov
Copy link

codecov bot commented Jun 14, 2021

Codecov Report

Merging #707 (72dbf28) into main (699e408) will increase coverage by 0.10%.
The diff coverage is 93.75%.

❗ Current head 72dbf28 differs from pull request most recent head 89ffb36. Consider uploading reports for the commit 89ffb36 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #707      +/-   ##
==========================================
+ Coverage   73.84%   73.94%   +0.10%     
==========================================
  Files         133      133              
  Lines        5880     5861      -19     
==========================================
- Hits         4342     4334       -8     
+ Misses       1315     1309       -6     
+ Partials      223      218       -5     
Impacted Files Coverage Δ
pkg/common/scheduler/statefulset/state.go 84.90% <66.66%> (-0.81%) ⬇️
pkg/common/scheduler/statefulset/scheduler.go 86.10% <95.23%> (+3.19%) ⬆️
pkg/common/scheduler/statefulset/autoscaler.go 73.58% <100.00%> (-1.10%) ⬇️
pkg/source/adapter/adapter.go 58.42% <0.00%> (ø)
...onsolidated/dispatcher/consumer_message_handler.go 0.00% <0.00%> (ø)
pkg/channel/consolidated/utils/util.go 100.00% <0.00%> (+4.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d1daa0...89ffb36. Read the comment docs.

@aavarghese aavarghese force-pushed the issue#593 branch 2 times, most recently from 45c21c3 to 13e7570 Compare June 16, 2021 17:10
@aavarghese aavarghese marked this pull request as ready for review June 16, 2021 17:10
@aavarghese aavarghese changed the title [WIP] Even Spread across Nodes in each zone for HA scheduler Even Spread across Nodes in each zone for HA scheduler Jun 16, 2021
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2021
@aavarghese
Copy link
Contributor Author

/cc @lionelvillard

@aavarghese aavarghese changed the title Even Spread across Nodes in each zone for HA scheduler [scheduler, mtsource] Even Spread across Nodes in each zone for HA scheduler Jun 16, 2021
@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-sandbox-eventing-kafka-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/common/scheduler/statefulset/scheduler.go 87.3% 89.9% 2.6
pkg/common/scheduler/statefulset/state.go 93.9% 92.3% -1.6

@aavarghese aavarghese marked this pull request as draft June 16, 2021 20:28
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2021
@knative-prow-robot knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 24, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aavarghese
To complete the pull request process, please assign evankanderson after the PR has been reviewed.
You can assign the PR to them by writing /assign @evankanderson in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 1, 2021
Signed-off-by: aavarghese <[email protected]>
@knative-prow-robot knative-prow-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 7, 2021
@aavarghese aavarghese closed this Jul 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Indicates the PR's author has signed the CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[scheduler,mtsource] Implementing EvenSpread HA support across nodes within a zone
3 participants