Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSR not working with Rocky / RHEL: Unavailable desc = name resolver error: produced zero addresses #1754

Closed
opipenbe opened this issue Oct 20, 2024 · 4 comments · Fixed by #1756
Labels

Comments

@opipenbe
Copy link

opipenbe commented Oct 20, 2024

What happened?

DSR mode is not working with Rocky / RHEL 9 if using kube-router v2.2.0 and v2.2.1.

In the kube-router logs:
E1020 08:05:55.567390 372833 service_endpoints_sync.go:60] Error setting up IPVS services for service external IP's and load balancer IP's: failed to setup DSR endpoint 30.0.0.1: unable to setup DSR receiver inside pod: failed to prepare endpoint 192.168.6.78 to do DSR due to: rpc error: code = Unavailable desc = name resolver error: produced zero addresses

DSR is working successfully with kube-router v2.1.3 with Rocky Linux 9.4. Error above occurs with kube-router v2.2.0 and v2.2.1. I believe a change between kube-router v2.1.3 and v2.2.0 created this incompatibility for DSR. I also tested DSR with Ubuntu 24.04 & kube-router v2.2.x in the same cluster and it does not have such issue.

What did you expect to happen?

DSR mode enabled without errors for RHEL and its clones.

How can we reproduce the behavior you experienced?

Steps to reproduce the behavior:

  1. Install kubeadm k8s cluster without kube-proxy and with cri-o runtime.
  2. Deploy latest kube-router v2.2.1 using kubeadm-kuberouter-all-features-dsr.yaml (https://github.com/cloudnativelabs/kube-router/blob/master/daemonset/kubeadm-kuberouter-all-features-dsr.yaml).
  3. Make sure to make following changes in kubeadm-kuberouter-all-features-dsr.yaml:
  • set --runtime-endpoint=unix:///run/crio/crio.sock
  • replace /var/run/docker.sock with /run/crio/crio.sock in volumeMounts and volumes configuration
  • instead of:
      - name: kubeconfig
        configMap:
          name: kube-proxy
          items:
          - key: kubeconfig.conf
            path: kubeconfig

replace with:

      - name: kubeconfig
        hostPath:
          path: /var/lib/kube-router

System Information (please complete the following information)

  • Kube-Router Version (kube-router --version): v2.2.0 and v2.2.1
  • Kube-Router Parameters: - --run-router=true - --run-firewall=true - --run-service-proxy=true- --bgp-graceful-restart=true - --kubeconfig=/var/lib/kube-router/kubeconfig - --runtime-endpoint=unix:///run/crio/crio.sock
  • Kubernetes Version (kubectl version) : k8s: v1.31.1, CRI-O: 1.31.1
  • Cloud Type: on premise
  • Kubernetes Deployment Type: kubeadm
  • Kube-Router Deployment Type: DaemonSet
  • Cluster Size: 6
@opipenbe opipenbe added the bug label Oct 20, 2024
@aauren
Copy link
Collaborator

aauren commented Oct 20, 2024

This appears to have happened when we switch grpc implementations from grpc.DialContext() to grpc.NewClient(). This fundamentally changed the resolver from the passthrough resolver to the dns resolver.

We can see this as the first thing that DialContext() does when it enters the function: https://github.com/grpc/grpc-go/blob/98959d9a4904e98bbf8b423ce6a3cb5d36f90ee1/clientconn.go#L228

We probably need to force the passthrough resolver to fix this problem.

@aauren
Copy link
Collaborator

aauren commented Oct 20, 2024

@opipenbe can you try the fix on #1756 and let me know how it works for you?

@opipenbe
Copy link
Author

Thank you @aauren ! I just built image from #1756 and it resolved this issue.

@aauren
Copy link
Collaborator

aauren commented Oct 21, 2024

Release v2.2.2 has been released with this fix.

The containers will probably take another ~45 minutes to finish building and publishing: https://github.com/cloudnativelabs/kube-router/actions/runs/11448400678

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants