Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no plugin for podAffinity violations #1462

Open
tuxillo opened this issue Jul 10, 2024 · 5 comments
Open

There is no plugin for podAffinity violations #1462

tuxillo opened this issue Jul 10, 2024 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@tuxillo
Copy link

tuxillo commented Jul 10, 2024

Is your feature request related to a problem? Please describe.

I think it is a problem because, as far as I know, there is no way of rescheduling pods that are in violation of podAffinity.

Describe the solution you'd like

Make a plugin available that deals with this, right? 🥲

Describe alternatives you've considered
Right now I'm manually doing rollout in deployments so that the scheduling is right, for example after ugprades.

What version of descheduler are you using?

descheduler version:
latest
Additional context

Issue #1286 was closed as "Not planned" but feedback was provided and I think it's a feature that's pretty much needed.

@tuxillo tuxillo added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 10, 2024
@tuxillo
Copy link
Author

tuxillo commented Aug 8, 2024

Any news?

@damemi
Copy link
Contributor

damemi commented Aug 8, 2024

Hi @tuxillo, sorry but unfortunately we don't have much bandwidth for adding new plugins at this time. The descheduling framework refactor is meant to make it easier for anyone to develop their own plugins. The existing plugin code should serve as a guide, but we hope to publish a more generic developer guide in the future.

@tuxillo
Copy link
Author

tuxillo commented Aug 9, 2024

@damemi thanks for the update. No worries! I was wondering if this plugin was missing for any particular reason. I might give it a try at implementing it but, should I wait for the refactor you're mentioning?

@damemi
Copy link
Contributor

damemi commented Aug 9, 2024

@tuxillo please feel free to implement it and open a PR if you'd like! the refactor is done already. Ideally, the goal of this was to make it so you don't even need to contribute the plugin code here if you don't want to and could just build your own descheduler with your plugin imported.

@tuxillo
Copy link
Author

tuxillo commented Sep 22, 2024

I did an initial version using RemovePodsViolatingInterPodAntiAffinity as the base, but as it turns out I ended up just reversing the logic of it, which is most certainly the wrong approach. And also sounds like not worth of a whole plugin.

Let me explain my use case:

I set application A to run in certain nodes. I want application B to always run in nodes where application A is, so for application B I use:

    affinity:
      podAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                    - appA
            topologyKey: kubernetes.io/hostname

As you know, sometimes app A will drift to other nodes (karpenter is auto scaling) but app B won't follow. The affinity is broken.

What I am currently doing in predicates is:

func CheckPodAffinityViolation(candidatePod *v1.Pod, assignedPods map[string][]*v1.Pod, nodeMap map[string]*v1.Node) bool {
	nodeHavingCandidatePod, ok := nodeMap[candidatePod.Spec.NodeName]
	if !ok {
		klog.Warningf("CandidatePod %s does not exist in nodeMap", klog.KObj(candidatePod))
		return false
	}

	affinity := candidatePod.Spec.Affinity
	if affinity == nil || affinity.PodAffinity == nil {
		return false
	}

	for _, term := range GetPodAffinityTerms(affinity.PodAffinity) {
		namespaces := GetNamespacesFromPodAffinityTerm(candidatePod, &term)
		selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector)
		if err != nil {
			klog.ErrorS(err, "Unable to convert LabelSelector into Selector")
			return false
		}

		for namespace := range namespaces {
			for _, assignedPod := range assignedPods[namespace] {
				if assignedPod.Name == candidatePod.Name || !PodMatchesTermsNamespaceAndSelector(assignedPod, namespaces, selector) {
					klog.V(4).InfoS("CandidatePod does not match inter-pod affinity rule of assigned pod on node", "candidatePod", klog.KObj(candidatePod), "assignedPod", klog.KObj(assignedPod))
					continue
				}

				nodeHavingAssignedPod, ok := nodeMap[assignedPod.Spec.NodeName]
				if !ok {
					continue
				}

				if hasSameLabelValue(nodeHavingCandidatePod, nodeHavingAssignedPod, term.TopologyKey) {
					klog.V(1).InfoS("CandidatePod matches inter-pod affinity rule of assigned pod on node", "candidatePod", klog.KObj(candidatePod), "assignedPod", klog.KObj(assignedPod))
					return false
				}
			}
		}
	}
	klog.V(1).Infof("CandidatePod did not match inter-pod affinity in node %v", candidatePod.Spec.NodeName)
	return true
}

So when the candidate pod are in the same node (since that's the topology key), just don't evict it. I think I'm missing some parts of the problem or not following the correct approach.

Any pointers would be appreciated 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants