Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is shared-state implemented? #37

Open
katoomegumi opened this issue Mar 24, 2024 · 15 comments
Open

How is shared-state implemented? #37

katoomegumi opened this issue Mar 24, 2024 · 15 comments

Comments

@katoomegumi
Copy link

According to the paper, godel-scheduler is a shared-state scheduler. Where can I find the implementation in the code? Particularly how to synchronize the of the global cluster view?

@NickrenREN
Copy link
Collaborator

based on list-watch mechanism

@katoomegumi
Copy link
Author

Sorry, I've learned about the list-watch but still have difficulties. The list-watch mechanism is mainly through monitoring the events about create, delete, etc. But what puzzles me is how can multiple schedulers get the global cluster view. Does every scheduler supervise all these events so they don't need synchronization; or they synchronize to a central global cluster view at a certain frequency ( the real-time synchronization )? And which struct in the code serve as the central global cluster view? I'm not sure about that.Is it is commoncache in the struct binder or the generationstore or some other struct?

@NickrenREN
Copy link
Collaborator

@katoomegumi each scheduler instance watches all events from apiserver(etcd), they don't need to sync up with each other.

@katoomegumi
Copy link
Author

@NickrenREN Thanks, and I think it's impossible to sync scheduler's cache for every event. so I think the code define the time internal to sync cache from events. Is it true?

// pkg/scheduler/scheduler.go
// func Run
if utilfeature.DefaultFeatureGate.Enabled(features.SchedulerCacheScrape) {
		// The metrics agent scrape endpoint every 5s and flush them to the metrics server every 30s. To
		// be more precise, scrape cache metrics every 5s.
		go wait.Until(func() {
			sched.commonCache.ScrapeCollectable(sched.metricsRecorder)
			sched.metricsRecorder.UpdateMetrics()
		}, 5*time.Second, sched.StopEverything)
	}

@NickrenREN
Copy link
Collaborator

@katoomegumi No, scheduler can receive every event and react to them (update cache and queue based on events). The code you posted is for collecting metrics, not the syncing cache logic.
btw, godel scheduler is created on the basis of kubernetes. you can spend more time on kubernetes and etcd.

@Wang-Xinkai
Copy link

@NickrenREN Thanks for the reply. Actually, we are interested in the "watch delay" in godel scheduler, which refers to the duration between event in etcd (e.g., cluster resource change) and each scheduler actually watch the event (update its cache). It is obvious that with higher QPS and larger cluster, the "watch delay" would be more severe... Admittedly, it is an inherent problem of K8S itself, but we wonder if godel made characterizations or specific optimizations of the "watch delay"?

FYI, the related discussion in k8s repo: kubernetes/kubernetes#108556

@NickrenREN
Copy link
Collaborator

@Wang-Xinkai hello, we optimize the "event latency" from two aspects.

  • one is the server side: we don't use etcd for large scale clusters in Bytedance, we use Kubebrain + ByteKV instead. This is not done in Godel Scheduler.
  • another one is the client side: we optimize the event processing workflow in Godel Scheduler, so that events won't be stuck in delta queue.

@Wang-Xinkai
Copy link

Thanks. I have checked the client-side optimizations.
According to my understanding, realistic godel has real-time resource view of its corresponding sub-cluster based on the dual-side optimizations on "event delay"? In such case, the event delay just relates to the network communication cost between apiserver and scheduler.

@NickrenREN
Copy link
Collaborator

NickrenREN commented Apr 10, 2024

@Wang-Xinkai My understanding is: event latency depends on three parts: 1. apiserver and etcd processing efficiency; 2. network condition between apiserver and client; 3. client processing efficiency.

We are now optimizing 1 and 3 to accelerate the event processing flow. But we can't say every thing (server and client sides) will alway be ok, so, we can't say the event delay just relates to the network communication cost between apiserver and scheduler.

Network conditon has nothing to do with k8s ecosystem, but in the future, we can explore if we can do something to simplify the interaction process between godel scheduler components. e.g. for now, all godel scheduler components get events from apiserver, can we let them talk to each other directly ? ...

@Wang-Xinkai
Copy link

Interesting idea lol. I agree with you on event delay decomposition! Do you have some cursory estimation of the scale of event delay: tens-of-ms, hundreds-of-ms, second-scale? under normal scenarios and extreme high load scenarios.

I mean, if the event delay is huge, the scheduler would be blind to some "free resource" in the cluster during the event delay, that causes great waste of cluster resource (if there are tasks waiting to be scheduled). That's why we are interested in this metric. Thanks.

@NickrenREN
Copy link
Collaborator

@Wang-Xinkai IIUC, you are worried about the Node resource update events latency ?

@Wang-Xinkai
Copy link

Right, do you have some thoughts about this issue? Or the experiences of the actual latency in realistic clusters? We suspect it affects the resource visibility of schedulers…

@NickrenREN
Copy link
Collaborator

@Wang-Xinkai In kubernetes, different resources (node, pods...) have different event transmission links. The number of nodes is not that large, so node resource is less likely to cause latency issues. At least, in Bytedance, we have never meet this kind of problems (our largest single cluster size: 20k nodes, 1000k pods)

@Wang-Xinkai
Copy link

okay, thanks for your generous replies. We will use Godel to study more about shared-state schedulers. Keep in touch!

@NickrenREN
Copy link
Collaborator

@Wang-Xinkai Cool, if you have any question, feel free to reach out to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants