Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-1844] PreEnqueue plugin implementation #626

Closed
wants to merge 4 commits into from

Conversation

craigcondit
Copy link
Contributor

@craigcondit craigcondit commented Jun 28, 2023

What is this PR for?

Implements the PreEnqueue() scheduling hook for the scheduler plugin implementation. This allows gating Pods that are not yet ready for scheduling due to queue pressure.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-1844

How should this be tested?

E2E scheduling tests should continue to succeed.

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@craigcondit craigcondit self-assigned this Jun 28, 2023
@craigcondit craigcondit changed the title [WIP] PreEnqueue plugin implementation [YUNIKORN-1844] [WIP] PreEnqueue plugin implementation Jun 28, 2023
@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Merging #626 (b36ef02) into master (7e63075) will decrease coverage by 0.05%.
The diff coverage is 64.00%.

❗ Current head b36ef02 differs from pull request most recent head 5bc7ed8. Consider uploading reports for the commit 5bc7ed8 to get more accurate results

@@            Coverage Diff             @@
##           master     #626      +/-   ##
==========================================
- Coverage   70.86%   70.82%   -0.05%     
==========================================
  Files          50       52       +2     
  Lines        8194     8251      +57     
==========================================
+ Hits         5807     5844      +37     
- Misses       2184     2201      +17     
- Partials      203      206       +3     
Impacted Files Coverage Δ
pkg/cache/context.go 43.32% <0.00%> (-0.21%) ⬇️
pkg/cache/task.go 68.71% <66.66%> (-1.34%) ⬇️
pkg/plugin/predicates/predicate_manager.go 59.60% <66.66%> (+1.02%) ⬆️
pkg/appmgmt/interfaces/task_sched_state.go 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@craigcondit craigcondit force-pushed the YUNIKORN-1844 branch 5 times, most recently from bb4e518 to e03f4ee Compare June 29, 2023 20:20
@craigcondit craigcondit changed the title [YUNIKORN-1844] [WIP] PreEnqueue plugin implementation [YUNIKORN-1844] PreEnqueue plugin implementation Jun 29, 2023
@craigcondit craigcondit marked this pull request as ready for review June 29, 2023 20:21
@wilfred-s
Copy link
Contributor

Code looks good. I was wondering if we can refactor the code a bit and move the event setup. EventsToRegister seems a bit out of place in the context and in the predicate manager just for the plugin callback.
Should the scheduler plugin itself also move into the plugin package. instead of being its own package?

One comment on the code: instead of passing around types.UID cast it to a string and pass the string, it is always a string for us.

@craigcondit
Copy link
Contributor Author

The predicate manager is the place in the code where we instantiate and manage the default scheduler plugins. It is used by both the standard and plugin mode and so is its own package. The EventsToRegister() function was previously a static list but this is fragile and doesn't cover all the events that the various plugins might need. Restructuring this is going to cause ugly circular dependencies.

As for types.UID, that is what is returned from the Kubernetes functions - I refactored out some common methods and decided that one cast inside the method was cleaner than casting at every call site.

@wilfred-s
Copy link
Contributor

As for types.UID, that is what is returned from the Kubernetes functions - I refactored out some common methods and decided that one cast inside the method was cleaner than casting at every call site.

This is the change when we cast early: it allows a further change as the pod.UID == the taskID The change replaces 7 casts with 4, and removed 6 function calls to get the task ID.

Subject: [PATCH] cast_pod_UID
---
Index: pkg/schedulerplugin/scheduler_plugin.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/pkg/schedulerplugin/scheduler_plugin.go b/pkg/schedulerplugin/scheduler_plugin.go
--- a/pkg/schedulerplugin/scheduler_plugin.go	(revision b36ef02f609a3f54100bed32d0be68a6e9d8dd09)
+++ b/pkg/schedulerplugin/scheduler_plugin.go	(date 1688611453984)
@@ -26,7 +26,6 @@
 	"go.uber.org/zap"
 	v1 "k8s.io/api/core/v1"
 	"k8s.io/apimachinery/pkg/runtime"
-	"k8s.io/apimachinery/pkg/types"
 	"k8s.io/apimachinery/pkg/util/sets"
 	"k8s.io/client-go/informers"
 	"k8s.io/kubernetes/pkg/scheduler/framework"
@@ -103,19 +102,20 @@
 		return nil
 	}
 
-	if app, task, ok := sp.getTask(appID, pod.UID); ok {
-		if _, ok := sp.context.GetInProgressPodAllocation(string(pod.UID)); ok {
+	taskID := string(pod.UID)
+	if app, task, ok := sp.getTask(appID, taskID); ok {
+		if _, ok = sp.context.GetInProgressPodAllocation(taskID); ok {
 			// pod must have failed scheduling in a prior run, reject it and return unschedulable
 			sp.failTask(pod, app, task)
 			return framework.NewStatus(framework.UnschedulableAndUnresolvable, "Pod is not ready for scheduling")
 		}
 
-		nodeID, ok := sp.context.GetPendingPodAllocation(string(pod.UID))
+		nodeID, ok := sp.context.GetPendingPodAllocation(taskID)
 		if task.GetTaskState() == cache.TaskStates().Bound && ok {
 			log.Log(log.ShimSchedulerPlugin).Info("Releasing pod for scheduling (PreEnqueue phase)",
 				zap.String("namespace", pod.Namespace),
 				zap.String("pod", pod.Name),
-				zap.String("taskID", task.GetTaskID()),
+				zap.String("taskID", taskID),
 				zap.String("assignedNode", nodeID))
 			return nil
 		}
@@ -154,19 +154,20 @@
 		return nil, framework.NewStatus(framework.Skip)
 	}
 
-	if app, task, ok := sp.getTask(appID, pod.UID); ok {
-		if _, ok := sp.context.GetInProgressPodAllocation(string(pod.UID)); ok {
+	taskID := string(pod.UID)
+	if app, task, ok := sp.getTask(appID, taskID); ok {
+		if _, ok = sp.context.GetInProgressPodAllocation(taskID); ok {
 			// pod must have failed scheduling, reject it and return unschedulable
 			sp.failTask(pod, app, task)
 			return nil, framework.NewStatus(framework.UnschedulableAndUnresolvable, "Pod is not ready for scheduling")
 		}
 
-		nodeID, ok := sp.context.GetPendingPodAllocation(string(pod.UID))
+		nodeID, ok := sp.context.GetPendingPodAllocation(taskID)
 		if task.GetTaskState() == cache.TaskStates().Bound && ok {
 			log.Log(log.ShimSchedulerPlugin).Info("Releasing pod for scheduling (PreFilter phase)",
 				zap.String("namespace", pod.Namespace),
 				zap.String("pod", pod.Name),
-				zap.String("taskID", task.GetTaskID()),
+				zap.String("taskID", taskID),
 				zap.String("assignedNode", nodeID))
 			return &framework.PreFilterResult{NodeNames: sets.NewString(nodeID)}, nil
 		}
@@ -196,17 +197,18 @@
 		return nil
 	}
 
-	if _, task, ok := sp.getTask(appID, pod.UID); ok {
+	taskID := string(pod.UID)
+	if _, task, ok := sp.getTask(appID, taskID); ok {
 		if task.GetTaskState() == cache.TaskStates().Bound {
 			// attempt to start a pod allocation. Filter() gets called once per {Pod,Node} candidate; we only want
 			// to proceed in the case where the Node we are asked about matches the one YuniKorn has selected.
 			// this check is fairly cheap (one map lookup); if we fail the check here the scheduling framework will
 			// immediately call Filter() again with a different candidate Node.
-			if sp.context.StartPodAllocation(string(pod.UID), nodeInfo.Node().Name) {
+			if sp.context.StartPodAllocation(taskID, nodeInfo.Node().Name) {
 				log.Log(log.ShimSchedulerPlugin).Info("Releasing pod for scheduling (Filter phase)",
 					zap.String("namespace", pod.Namespace),
 					zap.String("pod", pod.Name),
-					zap.String("taskID", task.GetTaskID()),
+					zap.String("taskID", taskID),
 					zap.String("assignedNode", nodeInfo.Node().Name))
 				return nil
 			}
@@ -236,13 +238,14 @@
 		return
 	}
 
-	if _, task, ok := sp.getTask(appID, pod.UID); ok {
+	taskID := string(pod.UID)
+	if _, _, ok := sp.getTask(appID, taskID); ok {
 		log.Log(log.ShimSchedulerPlugin).Info("Managed Pod bound successfully",
 			zap.String("namespace", pod.Namespace),
 			zap.String("pod", pod.Name),
-			zap.String("taskID", task.GetTaskID()),
+			zap.String("taskID", taskID),
 			zap.String("assignedNode", nodeName))
-		sp.context.RemovePodAllocation(string(pod.UID))
+		sp.context.RemovePodAllocation(taskID)
 	}
 }
 
@@ -278,9 +281,9 @@
 	return nil, fmt.Errorf("internal error: serviceContext should implement interface api.SchedulerAPI")
 }
 
-func (sp *YuniKornSchedulerPlugin) getTask(appID string, taskID types.UID) (app interfaces.ManagedApp, task interfaces.ManagedTask, ok bool) {
+func (sp *YuniKornSchedulerPlugin) getTask(appID, taskID string) (interfaces.ManagedApp, interfaces.ManagedTask, bool) {
 	if app := sp.context.GetApplication(appID); app != nil {
-		if task, err := app.GetTask(string(taskID)); err == nil {
+		if task, err := app.GetTask(taskID); err == nil {
 			return app, task, true
 		}
 	}
@@ -288,12 +291,13 @@
 }
 
 func (sp *YuniKornSchedulerPlugin) failTask(pod *v1.Pod, app interfaces.ManagedApp, task interfaces.ManagedTask) {
+	taskID := task.GetTaskID()
 	log.Log(log.ShimSchedulerPlugin).Info("Task failed scheduling, marking as rejected",
 		zap.String("namespace", pod.Namespace),
 		zap.String("pod", pod.Name),
-		zap.String("taskID", task.GetTaskID()))
-	sp.context.RemovePodAllocation(string(pod.UID))
-	dispatcher.Dispatch(cache.NewRejectTaskEvent(app.GetApplicationID(), task.GetTaskID(),
-		fmt.Sprintf("task %s rejected by scheduler", task.GetTaskID())))
+		zap.String("taskID", taskID))
+	sp.context.RemovePodAllocation(taskID)
+	dispatcher.Dispatch(cache.NewRejectTaskEvent(app.GetApplicationID(), taskID,
+		fmt.Sprintf("task %s rejected by scheduler", taskID)))
 	task.SetTaskSchedulingState(interfaces.TaskSchedFailed)
 }

@wilfred-s
Copy link
Contributor

The predicate manager is the place in the code where we instantiate and manage the default scheduler plugins. It is used by both the standard and plugin mode and so is its own package. The EventsToRegister() function was previously a static list but this is fragile and doesn't cover all the events that the various plugins might need. Restructuring this is going to cause ugly circular dependencies.

I moved the scheduler_plugin.go file into the plugin top level directory without an issue all unit tests and compilations still ran and no circular dependencies were seen. That is a simple refactor.Scheduler plugin belongs in plugin...

Further refactor we can push out but I think the predicate manager should become a single instance outside of the context.

@craigcondit
Copy link
Contributor Author

I still don't like moving the scheduler_plugin code to plugins. The plugins package predates the scheduler plugin functionality and actually refers to plugins to YuniKorn itself (general, sparkoperator, etc.) so the meaning is inverted. When I implemented the scheduler plugin in the first place I explicitly did NOT reuse the package to avoid confusion.

@craigcondit
Copy link
Contributor Author

I still don't like moving the scheduler_plugin code to plugins. The plugins package predates the scheduler plugin functionality and actually refers to plugins to YuniKorn itself (general, sparkoperator, etc.) so the meaning is inverted. When I implemented the scheduler plugin in the first place I explicitly did NOT reuse the package to avoid confusion.

Seems I was operating from memory rather than actually looking at the code. I agree, moving scheduler_plugin.go to pkg/plugin makes sense. The appmgmt stuff is already separated out.

@craigcondit
Copy link
Contributor Author

Updated PR based on review comments, and rebased on latest master.

@craigcondit
Copy link
Contributor Author

@wilfred-s can you re-review?

Copy link
Contributor

@wilfred-s wilfred-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wilfred-s wilfred-s closed this in e054f65 Jul 18, 2023
@craigcondit craigcondit deleted the YUNIKORN-1844 branch July 18, 2023 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants