[sophora-server][sophora-cluster-common] update pre-stop hook to 2.0.0 (

#87) * [sophora-server] update pre-stop hook to 2.0.0 and adapt chart to the changes introduced by the new version * fix cli call * [sophora-cluster-common] add alert for situations with multiple primary servers and change alerting runbook to include the new ways to disable switching * restore to use the gh warning markdown format
subshell · Apr 16, 2024 · a7d263c · a7d263c
1 parent dc2d065
commit a7d263c
Show file tree

Hide file tree

Showing 11 changed files with 152 additions and 31 deletions.
diff --git a/charts/sophora-cluster-common/Chart.yaml b/charts/sophora-cluster-common/Chart.yaml
@@ -2,5 +2,5 @@ apiVersion: v2
 name: sophora-cluster-common
 description: A Helm chart containing some common resources useful for Sophora cloud setups
 type: application
-version: 1.0.2
+version: 1.1.0
 appVersion: "4"
diff --git a/charts/sophora-cluster-common/alerting-runbook.md b/charts/sophora-cluster-common/alerting-runbook.md
@@ -17,29 +17,69 @@ replication will happen to other running servers, if there are any.
 * Check if the deployment has been uninstalled by mistake
 * Check whether the server might have crashed
 * Check the server logs for error messages
-* Check if it would be possible to elect another cluster server to the primary. This should be done carefully to ensure no data is lost.
+* Check if it would be possible to elect another cluster server to the primary. This should be done carefully to ensure
+  no data is lost.
 * Try to restart the server, if it is running but unresponsive
 * Restore the server from a working backup
 
 ### SophoraServerNotInSync
 
 **Severity:** high
 
-**Summary:** The Sophora server is not in sync. This is concluded from comparing the server's *SourceTime* with the 
-SourceTime of the primary server. The SourceTime is the timestamp of the latest event that occured on the primary server.
+**Summary:** The Sophora server is not in sync. This is concluded from comparing the server's *SourceTime* with the
+SourceTime of the primary server. The SourceTime is the timestamp of the latest event that occured on the primary
+server.
 Usually the SourceTimes of the servers should not diverge too much and stay equal when compared over a short time frame.
 
 **Remediation steps:**
 
-* Check if the primary server logged a message containing "ReplicationMaster stopped" or "StagingMaster stopped". If yes: The primary server needs to be
-restarted. If "ReplicationMaster stopped" is logged, this needs to happen **without electing another server to the primary**. The last part is absolutely critical to prevent data loss. As
-the servers automatically switch using a shutdown hook, a workaround is to exec into the container and replace the
-shutdown hook located in the `/tools/` directory with an empty executable file before restarting the server. Note that during the restart 
-working with Sophora will not be possible for a few minutes. If the error persists check the logs of the primary
-to find error logs hinting at the root cause of the problem.
-* Check if there is a large replication queue (e.g. due to a large amount of imports), which would result in a short replication
-delay
+* Check if the primary server logged a message containing "ReplicationMaster stopped" or "StagingMaster stopped". If
+  yes: The primary server needs to be
+  restarted. If "ReplicationMaster stopped" is logged, this needs to happen **without electing another server to the
+  primary**. The last part is absolutely critical to preventing data loss. Depending on the version of the Server Helm
+  Chart
+  you are using, there are two options to ensure this:
+    * Server Helm Chart 2.1.0 and later: Give the server's Pod the
+      annotation `prestop.server.sophora.cloud/switch-enabled: "false"`.
+    * Before 2.1.0: As the servers automatically switch using a shutdown hook, a workaround is to exec into the
+      container and replace the
+      shutdown hook located in the `/tools/` directory with an empty executable file before restarting the server. Note
+      that
+      during the restart
+      working with Sophora will not be possible for a few minutes. If the error persists check the logs of the primary
+      to find error logs hinting at the root cause of the problem.
+* Check if there is a large replication queue (e.g. due to a large amount of imports), which would result in a short
+  replication
+  delay
 * Check whether the not-in-sync server is in an erroneous state and stopped receiving replication messages
 * Check whether network connection issues between the server and the primary server exist
 * Check the server's and the primary server's logs for errors or warnings
 * Restart the server
+
+### MultiplePrimarySophoraServers
+
+**Severity:** critical
+
+**Summary:** The Sophora Cluster has more than one server claiming to be the primary server.
+Write operations with client tools can likely lead to inconsistencies in the entire Sophora cluster
+that will need to be resolved manually.
+
+**Remediation steps:**
+
+* Check if a cluster switch is in progress and taking longer than expected to complete
+* Restart all servers which should not be primary. To prevent these servers from switching automatically,
+  give their pods the annotation `prestop.server.sophora.cloud/switch-enabled: "false"`(*)
+* Check the server logs for error messages
+* Make sure the servers are started in the correct order. Currently, servers can only have one remote-server configured.
+  This means in a scenario with three or more cluster servers, it is possible that a server mistakenly assumes it should
+  start
+  as primary.
+* Make sure the PDB is configured to only let one cluster server be down at the same time, which should prevent this
+  from happening if the remote servers of each server are configured correctly in a loop. (e.g. 3 <- 1 <- 2 <- 3)
+* If there are inconsistencies in the cluster (e.g. documents created in both primaries) see if these can be resolved
+  manually.
+  Else, restore the servers from a backup.
+
+(*) This works starting with the Server Helm Chart 2.1.0 and the there included pre-stop hook 2.0.0.
+Before, this can only be done by replacing the shutdown hook located in the `/tools/` directory of the server's
+container with an empty executable file.
diff --git a/charts/sophora-cluster-common/templates/alerts/prometheusrule.yaml b/charts/sophora-cluster-common/templates/alerts/prometheusrule.yaml
@@ -28,6 +28,15 @@ spec:
             summary: Server is not in sync
             description: The server  "{{`{{ $labels.pod }}`}}" is not in sync for more than 2 minutes.
             runbook_url: 'https://github.com/subshell/helm-charts/blob/main/charts/sophora-cluster-common/alerting-runbook.md'
+        - alert: MultiplePrimarySophoraServers
+          for: 2m
+          expr: 'count(sophora_server_replication_mode == 1) > 1'
+          labels:
+            severity: critical
+          annotations:
+            summary: The Sophora Cluster has more than one server claiming to be the primary.
+            description: There are two primary servers in the cluster for more than 2 minutes.
+            runbook_url: 'https://github.com/subshell/helm-charts/blob/main/charts/sophora-cluster-common/alerting-runbook.md'
         {{- end }}
         {{- with .Values.prometheusRules.rules }}
         {{ tpl (toYaml .) $ | nindent 8 }}

diff --git a/charts/sophora-server/Chart.yaml b/charts/sophora-server/Chart.yaml
@@ -15,7 +15,7 @@ type: application
 # This is the chart version. This version number should be incremented each time you make changes
 # to the chart and its templates, including the app version.
 # Versions are expected to follow Semantic Versioning (https://semver.org/)
-version: 2.0.0
+version: 2.1.0
 
 # This is the version number of the application being deployed. This version number should be
 # incremented each time you make changes to the application. Versions are not expected to

diff --git a/charts/sophora-server/README.md b/charts/sophora-server/README.md
@@ -10,7 +10,7 @@ In later chart versions this will be the default.
 
 ## Postgres connection
 
-Starting with Sophora 5 the installation requires postgres. 
+Starting with Sophora 5, the installation requires postgres. 
 You can provide credentials via a secret: `sophora.server.persistence.postgres.secret`. 
 To enable the postgres version store set `sophora.server.persistence.postgres.versionStoreEnabled` to `true`. 
 For all other configuration options use `sophora.server.properties`. 
@@ -24,9 +24,9 @@ It's also possible to use postgres as your jcr repository. To use postgres with
 Cluster servers require one statefulset per instance. Deploy multiple statefulsets to create an actual sophora cluster. 
 Therefore `replicaCount` only supports `0` and `1`.
 
-#### Pod Anti Affinity
+#### Pod Anti-Affinity
 
-To prevent multiple cluster servers to be scheduled on the same k8s node you can use the podAntiAffinity. Per default,
+To prevent multiple cluster servers from being scheduled on the same k8s node, you can use the podAntiAffinity. Per default,
 you can write the following in your values file:
 
 ```yaml
@@ -53,33 +53,66 @@ You could also use a different `topologyKey` in order to make sure that deployme
 also across unique zones or regions.
 
 This is only necessary for cluster servers as there are usually only two of them, and you would want to ensure that in
-case of a node failure at least one cluster server remains running.
+case of a node failure, at least one cluster server remains running.
 
 #### Taint tolerations
 
-Kubernetes allows to select a node to schedule a pod based on different criteria. If one wants to make sure pod is only scheduled 
+Kubernetes allows selecting a node to schedule a pod based on different criteria. If one wants to make sure pod is only scheduled 
 on a certain node, one shall set a node affinity. If the node shall be exclusive for this kind of pod, there is the possibility
-to taint the node and provide the pods with a set of toleration to tolerate the taint. In sophora one may use this to provide a 
+to taint the node and provide the pods with a set of toleration to tolerate the taint. In Sophora, one may use this to provide a 
 separate node pool for a certain type of sophora servers exclusively.
 Further information on how taints work: [kubernetes.io/Taints and Tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#example-use-cases)
 
+#### Server pre-stop lifecycle hook
+
+All Sophora cluster servers are equipped with a pre-stop lifecycle hook, that is executed when the pod is about to shut down
+due to user request, uninstallation or the Kubernetes scheduler deciding to move it to another node (etc.).
+
+If the server to be shut down is the primary server, the hook will initiate a cluster switch to one of the other available
+servers, if there are any. Before switching, it filters the list of available replicas to find those suitable to switch to.
+
+The behaviour of the hook can be manipulated using the following **optional** annotations on the server's Pods:
+
+1. `prestop.server.sophora.cloud/switch-enabled: "<true|false>"`
+2. `prestop.server.sophora.cloud/is-switch-target: "<true|false>"`
+
+The first annotation controls whether the server shutting down should switch. 
+In some edge-cases, it might be useful to shut down a server without switching.
+
+The second annotation can be used to specify whether the annotated server should be a valid switch target server. 
+If set to `false`, the tool will not switch to that server.
+
+Both annotations default to `true`, if not specified or the value is not parseable to a boolean value, because generally
+switches should happen and should only be deactivated for maintenance, recovery or similar scenarios.
+
+For this to work, the server's pod requires a service account with the permission to `get` and `list` Pods and services
+in the namespace the server runs in. The SA, Role and Role Binding are created automatically.
+The creation of these resources can be controlled with the `serviceAccount:` section in the values file.
+
 #### Server mode pod labels
 
 Cluster servers run a sidecar container which continuously labels the pods with their server mode
 to make it possible to create a service which always points to the current primary server.
 
-For the sidecar to work the server requires a service account with the permission to `get` and `patch` pods
-in the namespace the server runs in. SA, Role and Role Binding are created if not unchecked via
-`serverModeLabeler.createServiceAccount: false`. 
-You can provide your own Service Account via `serviceAccountName:` in the values.
+For the sidecar to work, the server requires a service account with the permission to `get` and `patch` Pods
+in the namespace the server runs in. The SA, Role and Role Binding are created automatically by this chart.
+The creation of these resources can be controlled with the `serviceAccount:` section in the values file.
 
 
-## Breaking changes
-> [!WARNING] 
+## Notable Changes
+
+## 2.1.0
+Updates the pre-stop hook to version 2.0.0 and configures it accordingly.
+Please note that this now involves the creation of another Role and RoleBinding for this specific use-case, so that
+the hook can get information through the Kubernetes API. If you don't manage the Service Account through this Helm
+chart, you may need to configure it manually to provide the required permissions.
+
+## 2.0.0 (Breaking changes)
+> [!WARNING]
 > Please read this information carefully before updating!
-### 2.0.0
+
 * Renamed `serverModeLabeler.enabledOnClusterServers` to `serverModeLabeler.enabled`
-* Removed `serverModeLabeler.createServiceAccount` in favor of `serviceAccount.create`
+* Removed `serverModeLabeler.createServiceAccount` in favour of `serviceAccount.create`
 * Renamed `sidecars` to `extraContainers`
 * Create `serviceAccount` by default even if `serverModeLabeler.enabled` is set to `false
 * Names of `Role` and `RoleBinding` have been suffixed with `-server-mode-labeler`. 

diff --git a/charts/sophora-server/templates/role-prestop-hook.yaml b/charts/sophora-server/templates/role-prestop-hook.yaml
@@ -0,0 +1,16 @@
+{{- if and .Values.preStop.enabled .Values.serviceAccount.create -}}
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: {{ include "common.safeSuffixFullname" (list . "prestop-hook") }}
+  labels: {{- include "sophora-server.labels" . | nindent 4 }}
+rules:
+  - apiGroups:
+      - ""
+    resources:
+      - "pods"
+      - "services"
+    verbs:
+      - "get"
+      - "list"
+{{- end }}
diff --git a/...ver/templates/role-serverModeLabeler.yaml → ...r/templates/role-server-mode-labeler.yaml b/...ver/templates/role-serverModeLabeler.yaml → ...r/templates/role-server-mode-labeler.yaml
diff --git a/charts/sophora-server/templates/rolebinding-prestop-hook.yaml b/charts/sophora-server/templates/rolebinding-prestop-hook.yaml
@@ -0,0 +1,15 @@
+{{- if and .Values.preStop.enabled .Values.serviceAccount.create -}}
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: {{ include "common.safeSuffixFullname" (list . "prestop-hook") }}
+  labels: {{- include "sophora-server.labels" . | nindent 4 }}
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: {{ include "common.safeSuffixFullname" (list . "prestop-hook") }}
+subjects:
+  - kind: ServiceAccount
+    name: {{ include "sophora-server.fullname" . }}
+    namespace: {{ .Release.Namespace }}
+{{- end }}
diff --git a/...plates/rolebinding-serverModeLabeler.yaml → ...ates/rolebinding-server-mode-labeler.yaml b/...plates/rolebinding-serverModeLabeler.yaml → ...ates/rolebinding-server-mode-labeler.yaml
diff --git a/charts/sophora-server/templates/statefulset.yaml b/charts/sophora-server/templates/statefulset.yaml
@@ -245,20 +245,28 @@ spec:
                   optional: false
             {{- end }}
             {{ if and (eq .Values.sophora.server.isClusterServer true) (.Values.sophora.server.authentication.secret) -}}
-            - name: SOPHORAUSERNAME
+            - name: SOPHORA_USERNAME # required for the preStop hook
               valueFrom:
                 secretKeyRef:
                   key: {{ .Values.sophora.server.authentication.secret.usernameKey }}
                   name: {{ .Values.sophora.server.authentication.secret.name }}
                   optional: false
-            - name: SOPHORAPASSWORD
+            - name: SOPHORA_PASSWORD # required for the preStop hook
               valueFrom:
                 secretKeyRef:
                   key: {{ .Values.sophora.server.authentication.secret.passwordKey }}
                   name: {{ .Values.sophora.server.authentication.secret.name }}
                   optional: false
             - name: LOG_MODE # used by the preStop hook to configure JSON logging
               value: "prod"
+            - name: POD_NAME # required for the preStop hook
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.name
+            - name: POD_NAMESPACE # required for the preStop hook
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
             {{- end }}
             {{ if .Values.sophora.server.env -}}
             {{- toYaml .Values.sophora.server.env | nindent 12 }}
@@ -312,7 +320,7 @@ spec:
                   [
                       "/bin/sh",
                       "-c",
-                      "/tools/sophora-prestop switch --serverUrl=http://localhost:1196 1> /proc/1/fd/1 2> /proc/1/fd/2",
+                      "/tools/sophora-prestop switch --server-url=http://localhost:1196 1> /proc/1/fd/1 2> /proc/1/fd/2",
                   ]
           {{- end }}
           resources:

diff --git a/charts/sophora-server/values.yaml b/charts/sophora-server/values.yaml
@@ -14,7 +14,7 @@ preStop:
   image:
     repository: docker.subshell.com/tools/sophora-prestop
     pullPolicy: IfNotPresent
-    tag: "1.2.0"
+    tag: "2.0.0"
 
 serverModeLabeler:
   image: