-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Redis cluster nodes fail to join cluster due to FQDN DNS resolving lag #906
Comments
I see. In your tests, if you delete the slave Pod of a shard, a new Pod will be created to join the shard. However, due to the time it takes for DNS to become effective, the redis-cli add-node command may result in an error. |
Yes. I think all addons using FQDN to identify nodes should be reviewed, to check whether addons are having the same issue . |
Sure, this is a great suggestion to us. |
Describe the bug
For Redis Custer, current
addons/redis/redis-cluster-scripts/redis-cluster-server-start.sh
uses FQDN to add a node to cluster. But after a redis node pod rebuild or creation, due to the DNS cached entry refreshed after the cluster add-node command or the new FQDN DNS entry being resolvable after the command , it is possible that the cluster joining could fail.To Reproduce
Simulate a pod leaving the cluster and rejoin.
redis-cli --cluster del-node $current_node_ip_and_port $current_node_cluster_id
, simulating addons/redis/redis-cluster-scripts/redis-cluster-replica-member-leave.shExpected behavior
Retry more times to add-node until succeeded or for enough time for DNS to take effect.
Screenshots
NA
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: