-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permanent reconcile on emqx-core pods #998
Comments
Maybe related to #996? |
Hi @rouke-broersma I'm try deploy EMQX like you, and I find the pod reversion will indeed be update, but all of the pods already ready. name="emqx-core-674b9b99c-0"
kubectl get pods $name -o json > pod1.json
while true; do if [[ $(kubectl get pods $name -o json | jq '.metadata.resourceVersion') != $(cat pod1.json | jq .metadata.resourceVersion) ]]; then echo "reversion has been changed"; kubectl get pods $name -o json > pod2.json; break; fi; sleep 1; done
vimdiff pod1.json pod2.json And I found the diff just is the |
@Rory-Z yesterday the pods became ready no problem, but today the readiness gate does not get applied by the operator. I don't see any reason why, because the pod logs that it's ready to receive traffic. |
Could you please check the pod has any different between before change resourceVersion and after change resourceVersion ? I think it helpful to me. |
@Rory-Z I fixed the issue on my side now I think. The operator seemed to expect 1 replicant node, but the spec was set to 0. I had previously deleted the replicant pod but the ReplicaSet still existed (with 0 live pods). I now update the replica count for both core and replicant to 2, then deleted the replicant ReplicaSet. Now the operator recreated all core and replicant nodes and I am no longer getting constant reconciliation. Now I scaled down my EMQX to 1 core node and 0 replicas and now the reconciliation started again. Here are the diffs: {
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"creationTimestamp": "2024-01-09T11:42:54Z",
"generateName": "emqx-core-6bb54f9998-",
"labels": {
"apps.emqx.io/db-role": "core",
"apps.emqx.io/instance": "emqx",
"apps.emqx.io/managed-by": "emqx-operator",
"apps.emqx.io/pod-template-hash": "6bb54f9998",
"apps.kubernetes.io/pod-index": "0",
"controller-revision-hash": "emqx-core-6bb54f9998-5566b48f64",
"statefulset.kubernetes.io/pod-name": "emqx-core-6bb54f9998-0"
},
"name": "emqx-core-6bb54f9998-0",
"namespace": "emqx",
"ownerReferences": [
{
"apiVersion": "apps/v1",
"blockOwnerDeletion": true,
"controller": true,
"kind": "StatefulSet",
"name": "emqx-core-6bb54f9998",
"uid": "4e3a4dba-43ac-45a8-80ce-884ac9b41f04"
}
],
"resourceVersion": "62026292",
"uid": "aae023ce-7cc7-46db-a18b-7910ad0b69f1"
},
"spec": {
"containers": [
{
"env": [
{
"name": "EMQX_DASHBOARD__LISTENERS__HTTP__BIND",
"value": "18083"
},
{
"name": "POD_NAME",
"valueFrom": {
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.name"
}
}
},
{
"name": "EMQX_CLUSTER__DISCOVERY_STRATEGY",
"value": "dns"
},
{
"name": "EMQX_CLUSTER__DNS__RECORD_TYPE",
"value": "srv"
},
{
"name": "EMQX_CLUSTER__DNS__NAME",
"value": "emqx-headless.emqx.svc.cluster.local"
},
{
"name": "EMQX_HOST",
"value": "$(POD_NAME).$(EMQX_CLUSTER__DNS__NAME)"
},
{
"name": "EMQX_NODE__DATA_DIR",
"value": "data"
},
{
"name": "EMQX_NODE__ROLE",
"value": "core"
},
{
"name": "EMQX_NODE__COOKIE",
"valueFrom": {
"secretKeyRef": {
"key": "node_cookie",
"name": "emqx-node-cookie"
}
}
},
{
"name": "EMQX_API_KEY__BOOTSTRAP_FILE",
"value": "\"/opt/emqx/data/bootstrap_api_key\""
}
],
"image": "emqx/emqx:5.4.1",
"imagePullPolicy": "IfNotPresent",
"livenessProbe": {
"failureThreshold": 3,
"httpGet": {
"path": "/status",
"port": "dashboard",
"scheme": "HTTP"
},
"initialDelaySeconds": 60,
"periodSeconds": 30,
"successThreshold": 1,
"timeoutSeconds": 1
},
"name": "emqx",
"ports": [
{
"containerPort": 18083,
"name": "dashboard",
"protocol": "TCP"
}
],
"readinessProbe": {
"failureThreshold": 12,
"httpGet": {
"path": "/status",
"port": "dashboard",
"scheme": "HTTP"
},
"initialDelaySeconds": 10,
"periodSeconds": 5,
"successThreshold": 1,
"timeoutSeconds": 1
},
"resources": {},
"securityContext": {
"runAsGroup": 1000,
"runAsNonRoot": true,
"runAsUser": 1000
},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/opt/emqx/data/bootstrap_api_key",
"name": "bootstrap-api-key",
"readOnly": true,
"subPath": "bootstrap_api_key"
},
{
"mountPath": "/opt/emqx/etc/emqx.conf",
"name": "bootstrap-config",
"readOnly": true,
"subPath": "emqx.conf"
},
{
"mountPath": "/opt/emqx/log",
"name": "emqx-core-log"
},
{
"mountPath": "/opt/emqx/data",
"name": "emqx-core-data"
},
{
"mountPath": "/mounted/cert",
"name": "emqx-tls"
},
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"name": "kube-api-access-lgjr4",
"readOnly": true
}
]
}
],
"dnsPolicy": "ClusterFirst",
"enableServiceLinks": true,
"hostname": "emqx-core-6bb54f9998-0",
"nodeName": "talos-worker-0",
"preemptionPolicy": "PreemptLowerPriority",
"priority": 0,
"readinessGates": [
{
"conditionType": "apps.emqx.io/on-serving"
}
],
"restartPolicy": "Always",
"schedulerName": "default-scheduler",
"securityContext": {
"fsGroup": 1000,
"fsGroupChangePolicy": "Always",
"runAsGroup": 1000,
"runAsUser": 1000,
"supplementalGroups": [
1000
]
},
"serviceAccount": "default",
"serviceAccountName": "default",
"subdomain": "emqx-headless",
"terminationGracePeriodSeconds": 30,
"tolerations": [
{
"effect": "NoExecute",
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"tolerationSeconds": 300
},
{
"effect": "NoExecute",
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"tolerationSeconds": 300
}
],
"volumes": [
{
"emptyDir": {},
"name": "emqx-core-data"
},
{
"name": "bootstrap-api-key",
"secret": {
"defaultMode": 420,
"secretName": "emqx-bootstrap-api-key"
}
},
{
"configMap": {
"defaultMode": 420,
"name": "emqx-configs"
},
"name": "bootstrap-config"
},
{
"emptyDir": {},
"name": "emqx-core-log"
},
{
"name": "emqx-tls",
"secret": {
"defaultMode": 420,
"secretName": "mqtt-tls-certificate"
}
},
{
"name": "kube-api-access-lgjr4",
"projected": {
"defaultMode": 420,
"sources": [
{
"serviceAccountToken": {
"expirationSeconds": 3607,
"path": "token"
}
},
{
"configMap": {
"items": [
{
"key": "ca.crt",
"path": "ca.crt"
}
],
"name": "kube-root-ca.crt"
}
},
{
"downwardAPI": {
"items": [
{
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.namespace"
},
"path": "namespace"
}
]
}
}
]
}
}
]
},
"status": {
"conditions": [
{
"lastProbeTime": "2024-01-09T13:28:46Z",
"lastTransitionTime": "2024-01-09T13:28:46Z",
"status": "True",
"type": "apps.emqx.io/on-serving"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2024-01-09T11:42:54Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2024-01-09T13:23:34Z",
"status": "True",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2024-01-09T11:43:05Z",
"status": "True",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2024-01-09T11:42:54Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"containerID": "containerd://b5e6a9b8441e3a223e395618e57b0e78c395cd37c8b5eebfda56398d3943ea7f",
"image": "docker.io/emqx/emqx:5.4.1",
"imageID": "docker.io/emqx/emqx@sha256:6277190206f6669e451c8fe62336240dcfd7a87ab29443060737a0152f028249",
"lastState": {},
"name": "emqx",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2024-01-09T11:42:55Z"
}
}
}
],
"hostIP": "10.0.10.120",
"phase": "Running",
"podIP": "10.244.0.42",
"podIPs": [
{
"ip": "10.244.0.42"
}
],
"qosClass": "BestEffort",
"startTime": "2024-01-09T11:42:54Z"
}
} pod2.json: {
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"creationTimestamp": "2024-01-09T11:42:54Z",
"generateName": "emqx-core-6bb54f9998-",
"labels": {
"apps.emqx.io/db-role": "core",
"apps.emqx.io/instance": "emqx",
"apps.emqx.io/managed-by": "emqx-operator",
"apps.emqx.io/pod-template-hash": "6bb54f9998",
"apps.kubernetes.io/pod-index": "0",
"controller-revision-hash": "emqx-core-6bb54f9998-5566b48f64",
"statefulset.kubernetes.io/pod-name": "emqx-core-6bb54f9998-0"
},
"name": "emqx-core-6bb54f9998-0",
"namespace": "emqx",
"ownerReferences": [
{
"apiVersion": "apps/v1",
"blockOwnerDeletion": true,
"controller": true,
"kind": "StatefulSet",
"name": "emqx-core-6bb54f9998",
"uid": "4e3a4dba-43ac-45a8-80ce-884ac9b41f04"
}
],
"resourceVersion": "62026491",
"uid": "aae023ce-7cc7-46db-a18b-7910ad0b69f1"
},
"spec": {
"containers": [
{
"env": [
{
"name": "EMQX_DASHBOARD__LISTENERS__HTTP__BIND",
"value": "18083"
},
{
"name": "POD_NAME",
"valueFrom": {
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.name"
}
}
},
{
"name": "EMQX_CLUSTER__DISCOVERY_STRATEGY",
"value": "dns"
},
{
"name": "EMQX_CLUSTER__DNS__RECORD_TYPE",
"value": "srv"
},
{
"name": "EMQX_CLUSTER__DNS__NAME",
"value": "emqx-headless.emqx.svc.cluster.local"
},
{
"name": "EMQX_HOST",
"value": "$(POD_NAME).$(EMQX_CLUSTER__DNS__NAME)"
},
{
"name": "EMQX_NODE__DATA_DIR",
"value": "data"
},
{
"name": "EMQX_NODE__ROLE",
"value": "core"
},
{
"name": "EMQX_NODE__COOKIE",
"valueFrom": {
"secretKeyRef": {
"key": "node_cookie",
"name": "emqx-node-cookie"
}
}
},
{
"name": "EMQX_API_KEY__BOOTSTRAP_FILE",
"value": "\"/opt/emqx/data/bootstrap_api_key\""
}
],
"image": "emqx/emqx:5.4.1",
"imagePullPolicy": "IfNotPresent",
"livenessProbe": {
"failureThreshold": 3,
"httpGet": {
"path": "/status",
"port": "dashboard",
"scheme": "HTTP"
},
"initialDelaySeconds": 60,
"periodSeconds": 30,
"successThreshold": 1,
"timeoutSeconds": 1
},
"name": "emqx",
"ports": [
{
"containerPort": 18083,
"name": "dashboard",
"protocol": "TCP"
}
],
"readinessProbe": {
"failureThreshold": 12,
"httpGet": {
"path": "/status",
"port": "dashboard",
"scheme": "HTTP"
},
"initialDelaySeconds": 10,
"periodSeconds": 5,
"successThreshold": 1,
"timeoutSeconds": 1
},
"resources": {},
"securityContext": {
"runAsGroup": 1000,
"runAsNonRoot": true,
"runAsUser": 1000
},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"volumeMounts": [
{
"mountPath": "/opt/emqx/data/bootstrap_api_key",
"name": "bootstrap-api-key",
"readOnly": true,
"subPath": "bootstrap_api_key"
},
{
"mountPath": "/opt/emqx/etc/emqx.conf",
"name": "bootstrap-config",
"readOnly": true,
"subPath": "emqx.conf"
},
{
"mountPath": "/opt/emqx/log",
"name": "emqx-core-log"
},
{
"mountPath": "/opt/emqx/data",
"name": "emqx-core-data"
},
{
"mountPath": "/mounted/cert",
"name": "emqx-tls"
},
{
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
"name": "kube-api-access-lgjr4",
"readOnly": true
}
]
}
],
"dnsPolicy": "ClusterFirst",
"enableServiceLinks": true,
"hostname": "emqx-core-6bb54f9998-0",
"nodeName": "talos-worker-0",
"preemptionPolicy": "PreemptLowerPriority",
"priority": 0,
"readinessGates": [
{
"conditionType": "apps.emqx.io/on-serving"
}
],
"restartPolicy": "Always",
"schedulerName": "default-scheduler",
"securityContext": {
"fsGroup": 1000,
"fsGroupChangePolicy": "Always",
"runAsGroup": 1000,
"runAsUser": 1000,
"supplementalGroups": [
1000
]
},
"serviceAccount": "default",
"serviceAccountName": "default",
"subdomain": "emqx-headless",
"terminationGracePeriodSeconds": 30,
"tolerations": [
{
"effect": "NoExecute",
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"tolerationSeconds": 300
},
{
"effect": "NoExecute",
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"tolerationSeconds": 300
}
],
"volumes": [
{
"emptyDir": {},
"name": "emqx-core-data"
},
{
"name": "bootstrap-api-key",
"secret": {
"defaultMode": 420,
"secretName": "emqx-bootstrap-api-key"
}
},
{
"configMap": {
"defaultMode": 420,
"name": "emqx-configs"
},
"name": "bootstrap-config"
},
{
"emptyDir": {},
"name": "emqx-core-log"
},
{
"name": "emqx-tls",
"secret": {
"defaultMode": 420,
"secretName": "mqtt-tls-certificate"
}
},
{
"name": "kube-api-access-lgjr4",
"projected": {
"defaultMode": 420,
"sources": [
{
"serviceAccountToken": {
"expirationSeconds": 3607,
"path": "token"
}
},
{
"configMap": {
"items": [
{
"key": "ca.crt",
"path": "ca.crt"
}
],
"name": "kube-root-ca.crt"
}
},
{
"downwardAPI": {
"items": [
{
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "metadata.namespace"
},
"path": "namespace"
}
]
}
}
]
}
}
]
},
"status": {
"conditions": [
{
"lastProbeTime": "2024-01-09T13:29:16Z",
"lastTransitionTime": "2024-01-09T13:29:16Z",
"status": "True",
"type": "apps.emqx.io/on-serving"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2024-01-09T11:42:54Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2024-01-09T13:23:34Z",
"status": "True",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2024-01-09T11:43:05Z",
"status": "True",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2024-01-09T11:42:54Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"containerID": "containerd://b5e6a9b8441e3a223e395618e57b0e78c395cd37c8b5eebfda56398d3943ea7f",
"image": "docker.io/emqx/emqx:5.4.1",
"imageID": "docker.io/emqx/emqx@sha256:6277190206f6669e451c8fe62336240dcfd7a87ab29443060737a0152f028249",
"lastState": {},
"name": "emqx",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2024-01-09T11:42:55Z"
}
}
}
],
"hostIP": "10.0.10.120",
"phase": "Running",
"podIP": "10.244.0.42",
"podIPs": [
{
"ip": "10.244.0.42"
}
],
"qosClass": "BestEffort",
"startTime": "2024-01-09T11:42:54Z"
}
} |
@rouke-broersma looks just the |
This fixed the issue, thanks! |
Describe the bug
Similar to #982 but now I am seeing constant updates on the emqx-core pods.
To Reproduce
Cluster config: https://github.com/broersma-forslund/homelab/blob/main/apps/emqx/templates/cluster.yaml
Anything else we need to know?:
When I turn off the operator the changes stop, so it is the operator causing this.
The operator does not log anything about these changes with debug logging turned on:
Pod does not become ready, seemingly because the operator is constantly busy update:
The status of pod readiness gate "apps.emqx.io/on-serving" is False.
However the pod logs indicate that the pod is ready to serve:
Environment details::
@Rory-Z happy to provide you with any information you need to investigate this.
The text was updated successfully, but these errors were encountered: