You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
:~ # rke2 -v
rke2 version v1.28.8+rke2r1 (42cab2f)
go version go1.21.8 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Linux hostname 5.3.18-150300.59.161-default #1 SMP Thu May 9 06:59:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
3 Master 3 Worker nodes
Describe the bug:
We are trying to upgrade rke2 from v1.28.8+rke2r1(fresh install) to v1.28.12+rke2r1 / v1.29.6+rke2r1
After upgrade rke2 service comes up but we see all the helm jobs fails for system component calico. Helm Jobs are retriggered in continuous loop(possibly trying to upgrade the above components)
For some reason instead of upgrading the calico chart, It tries to uninstall the tigera operator CRDs and calico CRDs. In this process it hangs as resources are still present. Please see below log output for calico CRD job.
It looks like for some reason the Helm job to upgrade the chart was interrupted while upgrading the chart. The helm controller responded by trying to uninstall and reinstall the chart, but the uninstall job was also interrupted - so now the chart is stuck in the "uninstalling" status.
You might try deleting the Helm secrets for the rke2-calico-crd release, and rke-calico as well if necessary. This should allow it to successfully reinstall the chart.
What process did you use to upgrade your cluster? We do not generally see issues with the Helm jobs being interrupted while upgrading, unless the upgrade is interrupted partway through, leaving nodes deploying conflicting component versions.
Was there any recovery from this? We ran into this issue yesterday and had to restore controller VM and etcd from snapshots.
The symptoms and logs match exactly what was posted above. We initially attempted to install the CRDs and recreate the required resources, but calico controller continued to crashloop.
Ultimately, the restore from snapshots worked, but we actually had to do that twice as after adding additional controllers ,the helm upgrade was re-triggered and we had to restart the process. We're now currently running with just the one controller - not an ideal state.
probably this projectcalico/calico#9068, which was fixed upstream, but you will need to wait likely quite some time for this fix becoming available in rke and rancher
@brandond is there any possibility in rke2 to override the calico version being deployed?
The issue is in the chart itself, so no you can't just bump the version of calico that the chart deploys. You'll need to wait for us to update the chart in RKE2.
Environmental Info:
RKE2 Version: v1.28.8+rke2r1
:~ # rke2 -v
rke2 version v1.28.8+rke2r1 (42cab2f)
go version go1.21.8 X:boringcrypto
Node(s) CPU architecture, OS, and Version:
Linux hostname 5.3.18-150300.59.161-default #1 SMP Thu May 9 06:59:05 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
3 Master 3 Worker nodes
Describe the bug:
We are trying to upgrade rke2 from v1.28.8+rke2r1(fresh install) to v1.28.12+rke2r1 / v1.29.6+rke2r1
After upgrade rke2 service comes up but we see all the helm jobs fails for system component calico. Helm Jobs are retriggered in continuous loop(possibly trying to upgrade the above components)
For some reason instead of upgrading the calico chart, It tries to uninstall the tigera operator CRDs and calico CRDs. In this process it hangs as resources are still present. Please see below log output for calico CRD job.
kubectl get crds | grep -i calico --> No result
kubectl logs job/helm-install-rke2-calico-crd -n kube-system -f
The text was updated successfully, but these errors were encountered: