Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node labels ordering ambigious triggering rolling upgrades #319

Open
DavidFair opened this issue Apr 25, 2024 · 0 comments
Open

Node labels ordering ambigious triggering rolling upgrades #319

DavidFair opened this issue Apr 25, 2024 · 0 comments

Comments

@DavidFair
Copy link
Contributor

Using node group defaults, we've added an additional label:

  nodeGroupDefaults:
    nodeLabels:
      longhorn.store.nodeselect/longhorn-storage-node: true

This correctly propagates through to the kubeadmconfigtemplate , however the ordering is ambiguous causing each kubeadmconfigtemplate to duplicate

kubectl get kubeadmconfigtemplate -n clusters
NAME                                     AGE
jupyter-training-default-md-0-2fc04e89   45h
jupyter-training-default-md-0-625f3cc7   47h
jupyter-training-md-a4000-20f6d2bf       45h
jupyter-training-md-a4000-e1d949cf       47h
jupyter-training-md-a4000-ref-48ab2a00   47h
jupyter-training-md-a4000-ref-de3ae272   25h
jupyter-training-md-rtx4000-493dc31a     46h
jupyter-training-md-rtx4000-b4c1b7fc     45h
prod-mgmt-default-md-0-2fc04e89          2d2h
prod-mgmt-default-md-0-9b486d80          2d2h
prod-worker-default-md-0-2fc04e89        25h
vm-prod-default-md-0-2fc04e89            27h

Any chart changes will then trigger the existing machine deployments to upgrade to the other variety of template depending on the order of the labels. Here's the offending diff from two templates:

          Kubelet Extra Args:
            Cloud - Provider:  external
            Node - Labels:     longhorn.store.nodeselect/longhorn-storage-node=true,capi.stackhpc.com/node-group=md-a4000-ref
          Kubelet Extra Args:
            Cloud - Provider:  external
            Node - Labels:     capi.stackhpc.com/node-group=md-a4000-ref,longhorn.store.nodeselect/longhorn-storage-node=true

I can see our values get merged here: https://github.com/stackhpc/capi-helm-charts/blob/main/charts/openstack-cluster/templates/node-group/kubeadm-config-template.yaml#L18 where the overrides and defaults get merged together

Unfortunately, dicts are unordered so this will always randomly toggle our labels (and I suspect other places where we concat the dict in will too)

My idea is to change https://github.com/stackhpc/capi-helm-charts/blob/main/charts/openstack-cluster/templates/_helpers.tpl#L134
To sort then index the keys like so:

node-labels: "{{ range $ik := (keys . | uniq | sortAlpha) }} but I'm a bit stumped how to then pull the values out afterwards
Or if this is the correct approach, since I suspect there will be other places the ordering becomes important for the template hash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant