Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syslog / kubelet logs cluttered with gRPC messages #2410

Closed
pehlert opened this issue Aug 14, 2021 · 12 comments
Closed

Syslog / kubelet logs cluttered with gRPC messages #2410

pehlert opened this issue Aug 14, 2021 · 12 comments
Labels
bug Something isn't working component/deployment Helm chart, kubernetes templates and configuration Issues/PRs logging The Change is only in logging wontfix This will not be worked on

Comments

@pehlert
Copy link

pehlert commented Aug 14, 2021

Describe the bug

On all our nodes that run ceph-csi, we observe a high number of log messages (every few seconds), seemingly from gRPC that attempts to collect to the Ceph CSI socket.

Example:

Aug 14 06:26:36 k8s-stg-3 kubelet[748]: I0814 06:26:36.096622     748 clientconn.go:948] ClientConn switching balancer to "pick_first"
Aug 14 06:26:36 k8s-stg-3 kubelet[748]: I0814 06:26:36.096699     748 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick
Aug 14 06:26:36 k8s-stg-3 systemd[2906791]: run-docker-runtime\x2drunc-moby-ec6ffdf97e31d27d4c7fcb0e3ac68405c35873151d6a3e3372a0f89a6e487c94-runc.gwFZTw.mount: Succeeded.
Aug 14 06:26:36 k8s-stg-3 systemd[1]: run-docker-runtime\x2drunc-moby-ec6ffdf97e31d27d4c7fcb0e3ac68405c35873151d6a3e3372a0f89a6e487c94-runc.gwFZTw.mount: Succeeded.
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.651614     748 clientconn.go:106] parsed scheme: ""
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.653041     748 clientconn.go:106] scheme "" not registered, fallback to default scheme
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.653270     748 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock  <nil> 0 <nil>}] <nil> <nil>}
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.653295     748 clientconn.go:948] ClientConn switching balancer to "pick_first"
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.653426     748 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.656128     748 clientconn.go:106] parsed scheme: ""
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.656187     748 clientconn.go:106] scheme "" not registered, fallback to default scheme
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.656235     748 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock  <nil> 0 <nil>}] <nil> <nil>}
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.656257     748 clientconn.go:948] ClientConn switching balancer to "pick_first"
Aug 14 06:26:37 k8s-stg-3 kubelet[748]: I0814 06:26:37.656330     748 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.987622     748 clientconn.go:106] parsed scheme: ""
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.987680     748 clientconn.go:106] scheme "" not registered, fallback to default scheme
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.987742     748 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock  <nil> 0 <nil>}] <nil> <nil>}
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.987757     748 clientconn.go:948] ClientConn switching balancer to "pick_first"
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.987815     748 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.990574     748 clientconn.go:106] parsed scheme: ""
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.990623     748 clientconn.go:106] scheme "" not registered, fallback to default scheme
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.990667     748 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock  <nil> 0 <nil>}] <nil> <nil>}
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.990696     748 clientconn.go:948] ClientConn switching balancer to "pick_first"
Aug 14 06:26:38 k8s-stg-3 kubelet[748]: I0814 06:26:38.990868     748 clientconn.go:897] blockingPicker: the picked transport is not ready, loop back to repick

Environment details

  • Image/version of Ceph CSI driver : 3.4.0 (but issue has been present for much longer)
  • Helm chart version : n/a
  • Kernel version : 5.4.0-80-generic
  • Mounter used for mounting PVC (for cephfs its fuse or kernel. for rbd its
    krbd or rbd-nbd) : defaults
  • Kubernetes cluster version : 1.20
  • Ceph cluster version : 15.2.13

Expected behavior

These seemingly harmless warnings make the identification of actual errors in the logs much harder and result in unnecessary disk io. They shouldn't be logged when they don't indicate an actual issue.

@nixpanic nixpanic added logging The Change is only in logging component/deployment Helm chart, kubernetes templates and configuration Issues/PRs labels Aug 16, 2021
@nixpanic
Copy link
Member

The /var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock file is listed in the deploy/rbd/kubernetes/csi-rbdplugin.yaml file which is used for the node-plugin driver-registrar container. The same parameter is passed when deploying with Helm (see charts/ceph-csi-rbd/templates/nodeplugin-daemonset.yaml).

Could you update your csi-rbdplugin deamonset and change the

      containers:
        - name: driver-registrar
          image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.2.0
          args:
            - "--v=5"
            - "--csi-address=/csi/csi.sock"
            - "--kubelet-registration-path=/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock"

to

      containers:
        - name: driver-registrar
          image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.2.0
          args:
            - "--v=5"
            - "--csi-address=/csi/csi.sock"
            - "--kubelet-registration-path=unix:///var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock"

So, prefixing the --kubelet-registration-path option with unix:// as that should be the 'scheme' for the socket to use.

@nixpanic nixpanic added the bug Something isn't working label Aug 16, 2021
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Aug 16, 2021

@nixpanic are you working on this one?

@nixpanic
Copy link
Member

@nixpanic are you working on this one?

Not really, just asking if the suggestion addresses the issue. Maybe @pehlert wants to send a PR for correcting all --kubelet-registration-path= options if it fixes the problem.

@pehlert
Copy link
Author

pehlert commented Aug 16, 2021

@nixpanic Thanks for getting back on this! I could only briefly test without checking out any details yet.
However the pod will not come up anymore when I make that suggested change:

» kl -n ceph-csi-rbd logs ceph-csi-rbd-nodeplugin-644fj -c driver-registrar
I0816 08:09:58.400352 3998992 main.go:113] Version: v2.2.0
I0816 08:09:58.401235 3998992 main.go:137] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0816 08:09:58.401271 3998992 connection.go:153] Connecting to unix:///csi/csi.sock
I0816 08:09:58.402717 3998992 main.go:144] Calling CSI driver to discover driver name
I0816 08:09:58.402739 3998992 connection.go:182] GRPC call: /csi.v1.Identity/GetPluginInfo
I0816 08:09:58.402746 3998992 connection.go:183] GRPC request: {}
I0816 08:09:58.408175 3998992 connection.go:185] GRPC response: {"name":"rbd.csi.ceph.com","vendor_version":"v3.4.0"}
I0816 08:09:58.408298 3998992 connection.go:186] GRPC error: <nil>
I0816 08:09:58.408313 3998992 main.go:154] CSI driver name: "rbd.csi.ceph.com"
I0816 08:09:58.408405 3998992 node_register.go:52] Starting Registration Server at: /registration/rbd.csi.ceph.com-reg.sock
I0816 08:09:58.408699 3998992 node_register.go:61] Registration Server started at: /registration/rbd.csi.ceph.com-reg.sock
I0816 08:09:58.408928 3998992 node_register.go:83] Skipping healthz server because HTTP endpoint is set to: ""
I0816 08:09:59.785091 3998992 main.go:80] Received GetInfo call: &InfoRequest{}
I0816 08:09:59.792469 3998992 main.go:90] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix unix:///var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock: connect: no such file or directory",}
E0816 08:09:59.792633 3998992 main.go:92] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix unix:///var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock: connect: no such file or directory", restarting registration container.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Aug 16, 2021

Looking at https://github.com/grpc/grpc-go/blob/master/clientconn.go#L251-L268 it might work to reduce scheme "" not registered, fallback to default scheme not the other log messages but worth to try, updating example/doc at https://github.com/kubernetes-csi/node-driver-registrar might also help other drivers also.

@nixpanic
Copy link
Member

Thanks for testing @pehlert! I think the hostPath mount/volume is missing for the driver-registrar container as well. this is documented at https://github.com/kubernetes-csi/node-driver-registrar#required-permissions. There is already a plugin-dir volume that points to /var/lib/kubelet/plugins, so adding that should be relatively simple.

Is that something you can try out easily?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Aug 16, 2021

@pehlert can you also try - --kubelet-registration-path=passthrough:///var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock ?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Aug 16, 2021

the open issue already exists at kubernetes-csi/node-driver-registrar#88

@pehlert
Copy link
Author

pehlert commented Aug 16, 2021

@pehlert can you also try - --kubelet-registration-path=passthrough:///var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock ?

I tried this and while the pod came up just fine and seemingly worked as expected, it didn't have any effect on the warnings.
They were still showing in the same verbosity and frequency.

@humblec
Copy link
Collaborator

humblec commented Aug 18, 2021

@pehlert can you also try - --kubelet-registration-path=passthrough:///var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock ?

I tried this and while the pod came up just fine and seemingly worked as expected, it didn't have any effect on the warnings.
They were still showing in the same verbosity and frequency.

@pehlert does decreasing the log level helps?
also it looks to me scenario with docker engine, but not fully sure on this.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the wontfix This will not be worked on label Sep 17, 2021
@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/deployment Helm chart, kubernetes templates and configuration Issues/PRs logging The Change is only in logging wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants