You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
postgres@psql-09:/root$ repmgr standby switchover --siblings-follow --dry-run
NOTICE: checking switchover on node "psql-09" (ID: 9) in --dry-run mode
INFO: SSH connection to host "10.10.10.7" succeeded
INFO: able to execute "repmgr" on remote host "10.10.10.7"
INFO: all sibling nodes are reachable via SSH
INFO: 4 walsenders required, 20 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: archive mode is "off"
INFO: replication lag on this standby is 2 seconds
INFO: 4 replication slots required, 20 available
NOTICE: attempting to pause repmgrd on 5 nodes
NOTICE: local node "psql-09" (ID: 9) would be promoted to primary; current primary "psql-07" (ID: 7) would be demoted to standby
INFO: following shutdown command would be run on node "psql-07":
"sudo /usr/bin/pg_ctlcluster 15 main stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
However psql-09 (which is a more powerful server) was configured to max_worker_processes=64 while psql-07 was just max_worker_processes=32. So when we actually did the switchover, we ended up in a limbo state where none of the replicas could join, because they could not restart because of the difference to that param:
Aug 21 22:11:47 psql-08 postgres[4082218]: [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
Aug 21 22:11:47 psql-08 postgres[4082221]: [1] LOG: database system was interrupted while in recovery at log time 2023-08-21 21:49:15 UTC
Aug 21 22:11:47 psql-08 postgres[4082221]: [2] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
Aug 21 22:11:48 psql-08 postgres[4082221]: [1] LOG: entering standby mode
Aug 21 22:11:48 psql-08 postgres[4082221]: [1] FATAL: recovery aborted because of insufficient parameter settings
Aug 21 22:11:48 psql-08 postgres[4082221]: [2] DETAIL: max_worker_processes = 32 is a lower setting than on the primary server, where its value was 64.
Aug 21 22:11:48 psql-08 postgres[4082221]: [3] HINT: You can restart the server after making the necessary configuration changes.
Aug 21 22:11:48 psql-08 postgres[4082218]: [1] LOG: startup process (PID 4082221) exited with exit code 1
Aug 21 22:11:48 psql-08 postgres[4082218]: [1] LOG: aborting startup due to startup process failure
Aug 21 22:11:48 psql-08 postgres[4082218]: [1] LOG: database system is shut down
That's unexpected that this was not caught 😬
The text was updated successfully, but these errors were encountered:
We were just carrying out a switchover of our primary using repmgr 5.3.3:
sudo -u postgres repmgr standby switchover --siblings-follow --dry-run
However psql-09 (which is a more powerful server) was configured to
max_worker_processes=64
while psql-07 was justmax_worker_processes=32
. So when we actually did the switchover, we ended up in a limbo state where none of the replicas could join, because they could not restart because of the difference to that param:That's unexpected that this was not caught 😬
The text was updated successfully, but these errors were encountered: