Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Redis cluster nodes fail to join cluster due to FQDN DNS resolving lag #906

Open
leonliao opened this issue Aug 9, 2024 · 3 comments
Assignees

Comments

@leonliao
Copy link

leonliao commented Aug 9, 2024

Describe the bug
For Redis Custer, current addons/redis/redis-cluster-scripts/redis-cluster-server-start.sh uses FQDN to add a node to cluster. But after a redis node pod rebuild or creation, due to the DNS cached entry refreshed after the cluster add-node command or the new FQDN DNS entry being resolvable after the command , it is possible that the cluster joining could fail.

To Reproduce

Simulate a pod leaving the cluster and rejoin.

  1. Login a slave pod in Redis Cluster, execute the redis-cli --cluster del-node $current_node_ip_and_port $current_node_cluster_id, simulating addons/redis/redis-cluster-scripts/redis-cluster-replica-member-leave.sh
  2. Delete the slave pod
  3. There could be chances that seeing errors like below:

# DNS staled cache points the FQDN to old POD
+ current_node_with_port=redis-cluster-shard-kqr-0.redis-cluster-shard-kqr-headless.default.svc:6379
+ set +x
scale out replica replicated command: redis-cli --cluster add-node redis-cluster-shard-kqr-0.redis-cluster-shard-kqr-headless.default.svc:6379 redis-cluster-shard-kqr-1.redis-cluster-shard-kqr-headless.default.svc:6379 --cluster-slave --cluster-master-id 587b870ce5809309004f785d0837ed11f82a33e5 -a ********
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Could not connect to Redis at redis-cluster-shard-kqr-0.redis-cluster-shard-kqr-headless.default.svc:6379: Connection timed out

# DNS taking effect after the add-node command
+ current_node_with_port=redis-cluster-shard-btn-0.redis-cluster-shard-btn-headless.default.svc:6379
+ set +x
scale out replica replicated command: redis-cli --cluster add-node redis-cluster-shard-btn-0.redis-cluster-shard-btn-headless.default.svc:6379 redis-cluster-shard-btn-1.redis-cluster-shard-btn-headless.default.svc:6379 --cluster-slave --cluster-master-id 75203464beaf7403fc82eeeb24c98f9a9590a054 -a ********
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Could not connect to Redis at redis-cluster-shard-btn-0.redis-cluster-shard-btn-headless.default.svc:6379: Name or service not known

Expected behavior
Retry more times to add-node until succeeded or for enough time for DNS to take effect.

Screenshots
NA

Desktop (please complete the following information):

  • OS: MacOS
  • Version: kubeblocks-0.9.0
@weicao
Copy link
Contributor

weicao commented Aug 11, 2024

I see. In your tests, if you delete the slave Pod of a shard, a new Pod will be created to join the shard. However, due to the time it takes for DNS to become effective, the redis-cli add-node command may result in an error.

@leonliao
Copy link
Author

I see. In your tests, if you delete the slave Pod of a shard, a new Pod will be created to join the shard. However, due to the time it takes for DNS to become effective, the redis-cli add-node command may result in an error.

Yes. I think all addons using FQDN to identify nodes should be reviewed, to check whether addons are having the same issue .

@weicao
Copy link
Contributor

weicao commented Aug 22, 2024

I see. In your tests, if you delete the slave Pod of a shard, a new Pod will be created to join the shard. However, due to the time it takes for DNS to become effective, the redis-cli add-node command may result in an error.

Yes. I think all addons using FQDN to identify nodes should be reviewed, to check whether addons are having the same issue .

Sure, this is a great suggestion to us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants