[cinder-csi-plugin] node plugin pods restart with health check failing #1674

bhachn · 2021-10-22T19:16:29Z

BUG Report
cinder-csi-plugin

What happened:
openstack-cinder-csi node plugin pod is getting restarted with health check failing with below error:

health check failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

liveness container of the pod is getting restarted

What you expected to happen:
Pods should not restart and service should be running smoothly

Anything else we need to know?:
The load on the server as well is not high and well under limits in the range of ~2% CPU load utilization

Environment: Production

OpenStack version: openstack-cinder-csi-1.4.6
K8s version 1.20.7

The text was updated successfully, but these errors were encountered:

jichenjc · 2021-10-25T07:04:37Z

@bhachn can you help provide the configurations and logs of your nnode plugin?
I assume some errors occurs ahead so those info should be helpful

bhachn · 2021-10-25T08:42:49Z

@jichenjc, please find attached the logs from all the containers running inside the pod.

cinder-csi.log
cinder-csi-liveness.log
cinder-csi-node-driver.log

jichenjc · 2021-10-26T02:57:20Z

looks like the error log is showing

I1023 01:09:45.610277       1 connection.go:153] Connecting to unix:///csi/csi.sock
E1023 01:09:46.610083       1 main.go:74] health check failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

and 

E1020 14:32:46.634270       1 connection.go:131] Lost connection to unix:///csi/csi.sock.
E1020 15:14:46.631193       1 connection.go:131] Lost connection to unix:///csi/csi.sock.
E1020 20:06:46.630356       1 connection.go:131] Lost connection to unix:///csi/csi.sock.

I don't know the reason behind the lost connection, it's weird .. @ramineni have you saw this before?
how about enable debug (add --v=5 in the pod start up command ) so that we might have further info?

ramineni · 2021-10-26T06:01:13Z

@bhachn Is this happening on all nodes or only some nodes have this problem?
And also could you check if restarting the pod resolve the error? that is simply restarting the pod by deleting it and letting the replicaset/daemonset run it again (not redeploying)

This issue looks similar to kubernetes-csi/node-driver-registrar#139

bhachn · 2021-10-26T08:15:24Z

@ramineni
It's happening on a single node. As suggested we've restart the pod and would be monitoring the same for a day and update in case we experience same again.

bhachn · 2021-10-27T08:43:33Z

@ramineni
We checked and post restart things look better.
However we would like to observe the same this week and update here in case any abnormal behavior is observed.

ramineni · 2021-10-27T09:50:55Z

@bhachn Thanks for the update.
As I mentioned above, the issue is related to node-driver-registrar not the plugin itself.
I suppose we could close the issue in this repo and you could track the issue kubernetes-csi/node-driver-registrar#139 for any update.

bhachn · 2021-10-29T07:29:26Z

Thanks @ramineni.
Closing the issue.

ramineni changed the title ~~cinder-csi node plugin pods restart with health check failing~~ [cinder-csi-plugin] node plugin pods restart with health check failing Oct 27, 2021

bhachn closed this as completed Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cinder-csi-plugin] node plugin pods restart with health check failing #1674

[cinder-csi-plugin] node plugin pods restart with health check failing #1674

bhachn commented Oct 22, 2021

jichenjc commented Oct 25, 2021

bhachn commented Oct 25, 2021

jichenjc commented Oct 26, 2021

ramineni commented Oct 26, 2021

bhachn commented Oct 26, 2021

bhachn commented Oct 27, 2021

ramineni commented Oct 27, 2021

bhachn commented Oct 29, 2021

[cinder-csi-plugin] node plugin pods restart with health check failing #1674

[cinder-csi-plugin] node plugin pods restart with health check failing #1674

Comments

bhachn commented Oct 22, 2021

jichenjc commented Oct 25, 2021

bhachn commented Oct 25, 2021

jichenjc commented Oct 26, 2021

ramineni commented Oct 26, 2021

bhachn commented Oct 26, 2021

bhachn commented Oct 27, 2021

ramineni commented Oct 27, 2021

bhachn commented Oct 29, 2021