You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.
Hi @shlomi-noach , I think I might have found a bug during graceful-master-takeover process.
All slaves, prior to graceful-master-takeover starting, have the following: Auto_Position: 1. However, after graceful-master-takeover takes place, Auto_Position is set to 0, and further graceful failovers do not work until I set it back to 1.
I wrote a post graceful-master-takeover hook, which does the following:
Restarts the slave threads on old master (now a slave)
Gets a list of all secondary slaves from the old master (for now this is hard coded as you can see below as this is proof of concept)
Moves the secondary slaves as slaves of the old master (now a slave) after graceful-failover
Starts slave threads on secondary slaves
#!/bin/bash
echo "Restarting slave threads on old master ${ORC_FAILED_HOST}:${ORC_FAILED_PORT}"
orchestrator -c start-slave -i ${ORC_FAILED_HOST}:${ORC_FAILED_PORT}
echo "Getting list of secondary slaves from new master"
SEC_SLAVES=()
for secondary_slave in `orchestrator-client -c which-replicas -i ${ORC_SUCCESSOR_HOST}:${ORC_FAILED_PORT} | grep po-mysql4`
do
SEC_SLAVES+=(${secondary_slave})
done
for ancillary_slave in "${SEC_SLAVES[@]}"
do
echo "Making SECONDARY SLAVE ${ancillary_slave} as a SLAVE of ${ORC_FAILED_HOST}"
orchestrator -c relocate -i ${ancillary_slave} -d ${ORC_FAILED_HOST}:${ORC_FAILED_PORT}
orchestrator -c start-slave -i ${ancillary_slave}
done
Here are the before and after pictures. Notice this only worked after I did the following on the old master
after graceful-master-takeover was all finished:
STOP SLAVE; CHANGE MASTER TO MASTER_AUTO_POSITION = 1; START SLAVE;
Thank you @almeida-pythian, I can confirm I'm able to reproduce this.
To be more specific: the demoted master, now returned as a replica, is set with auto_position=0 even if the topology is all using auto_position=1. The rest of the failover is fine and other replicas maintain their auto_position setting.
I'll look into it, but worth noting that the way Oracle implemented GTID calls for some confusion. Each replica chooses whether to auto_position or not. We can have a hybrid topology. And the master itself? It's not replicating; so does it use or does it not use GTID?
What if the master had one replica with auto_position=0 and one with auto_position=1? What should happen after failover?
Sigh. Yet another "try and think like a human" for orchestrator here, and yet another "no single solution to satisfy all cases and all users".
Hi @shlomi-noach Thanks for looking into this. I'm sorry I posted to the other site (outbrain), my brain must have been out and I did not realize I was on the wrong place :-) Thanks for moving it here.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
On behalf of @almeida-pythian, cross post from outbrain-inc/orchestrator#304
Hi @shlomi-noach , I think I might have found a bug during graceful-master-takeover process.
All slaves, prior to graceful-master-takeover starting, have the following: Auto_Position: 1. However, after graceful-master-takeover takes place, Auto_Position is set to 0, and further graceful failovers do not work until I set it back to 1.
I have the following test scenario below:
I wrote a post graceful-master-takeover hook, which does the following:
Here are the before and after pictures. Notice this only worked after I did the following on the old master
after graceful-master-takeover was all finished:
STOP SLAVE; CHANGE MASTER TO MASTER_AUTO_POSITION = 1; START SLAVE;
Screeshots below show the before and after:
Here's my config:
Thanks for your help.
The text was updated successfully, but these errors were encountered: