-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read: connection reset by peer #236
Comments
I've seen the occasional error like this reported but I've never been able to reproduce it. Do you know what's at XX.XXX.39.206? |
Yes. I have just reducted the IP Address. The error appears often. Like every 2-5 Minute |
Could it be the control plane (i.e. masters / apiserver)? |
Prior version (#49) was also on EKS. A similar problem on a different project is on Azure: kubernetes-sigs/cluster-api-provider-azure#428 Have you seen any log messages like |
I wonder if it has something to do with the webhooks. If you try to do something illegal - e.g., deleting a propagated object - does it go through or do you get an error? The webhooks are only there to stop you from doing the wrong thing, so if you only use HNC correct, everything would appear to work. Are you using internal cert management (the default) or something like |
Hi,
No certificate errors. No cert-manager.
I also suspect the webhooks. I remove them first of all and see whats happening. Another suspect is the AWS securitygroup setting. I will check this also. |
Feedback: Deleting the webhooks: kind: MutatingWebhookConfiguration ... made that the TLS handshake Error disapeared. The configuration of them was:
apiVersion: admissionregistration.k8s.io/v1
Do you have a hint what can cause the issue? So I can investigate further. What I do in the background:
|
One option is to try switching to cert-manager. I've had numerous issues with the cert-rotator. |
If there are no certificate errors, then switching from cert-rotator to
cert-manager doesn't seem too likely to solve the issue but you can
certainly try. Maybe Erik can give you the instructions :) Did you check
the apiserver logs as well?
Try re-installing the webhooks, then create two namespaces (kubectl create
ns foo) and then try to make them parents of each other (kubectl hns set
--parent foo bar, then kubectl hns set --parent bar foo). If the first one
fails, that means the webhooks aren't responding properly - I doubt that's
the case, or you would have seen it already. If the second one *succeeds*
it means they're somehow being skipped.
If they both work, then the problem's probably not on the HNC side - it's
probably from the EKS control plane. After all, the log is saying that the
*client* did something wrong - it sent a zero-length handshake. There's not
much HNC can do about that.
…On Tue, Nov 15, 2022 at 3:31 AM Erik Godding Boye ***@***.***> wrote:
One option is to try switching to cert-manager. I've had numerous issues
with the cert-rotator.
—
Reply to this email directly, view it on GitHub
<#236 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE43PZD6DEKG7XEBR6K4ZITWINC6ZANCNFSM6AAAAAAR4PXCME>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I've came across the same issue on kind. The similar issues can be found in other projects. |
Thanks for that @mochizuki875 . Unfortunately the problem seems to be coming from K8s itself (kubernetes/kubernetes#109022) so there's nothing we can do here. /close |
@adrianludwin: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hello,
I see errors in the log of the controller.
But the functionality seems still to work.
So my problem is mainly that errors are written in the logs.
Details:
Kubernetes EKS 1.23
At the moment we run 75 Namespaces in total (With two rolebindings each NS)
The Configuration of the deployment is this:
- --metrics-addr=:8080
- --max-reconciles=30
- --apiserver-qps-throttle=30
- --excluded-namespace=flux-system
- --excluded-namespace=kube-system
- --excluded-namespace=kube-public
- --excluded-namespace=hnc-system
- --excluded-namespace=kube-node-lease
- --excluded-namespace=ingress-controller
- --excluded-namespace=observability
- --excluded-namespace=postgres
- --excluded-namespace=rabbitmq
- --enable-internal-cert-management
- --cert-restart-on-secret-refresh
- --included-namespace-regex=app-.*
(I tried allready to adjust the trottle and reconcil time with no success)
The Log outputs a huge ammount of this kind of logs:
│ 2022-11-10T11:52:10.356933556Z {"level":"error","ts":1668081130.3564637,"msg":"http: TLS handshake error from XX.XXX.39.206:60406: EOF"} │
│ 2022-11-10T11:54:55.332004115Z {"level":"error","ts":1668081295.331847,"msg":"http: TLS handshake error from XX.XXX.39.206:46584: EOF"} │
│ 2022-11-10T11:55:33.518983284Z {"level":"error","ts":1668081333.5188336,"msg":"http: TLS handshake error from XX.XXX.39.206:52420: EOF"} │
│ 2022-11-10T11:56:07.074519506Z {"level":"error","ts":1668081367.0743582,"msg":"http: TLS handshake error from XX.XXX.39.206:34722: EOF"} │
│ 2022-11-10T11:58:05.740595523Z {"level":"error","ts":1668081485.7404268,"msg":"http: TLS handshake error from XX.XXX.39.206:49758: EOF"} │
│ 2022-11-10T11:58:05.795854044Z {"level":"error","ts":1668081485.7956684,"msg":"http: TLS handshake error from XX.XXX.39.206:49764: EOF"} │
│ 2022-11-10T11:58:06.018933577Z {"level":"error","ts":1668081486.0187643,"msg":"http: TLS handshake error from XX.XXX.39.206:49786: EOF"} │
│ 2022-11-10T11:58:06.118885718Z {"level":"error","ts":1668081486.118721,"msg":"http: TLS handshake error from XX.XXX.39.206:49792: EOF"} │
│ 2022-11-10T11:58:06.216421670Z {"level":"error","ts":1668081486.2161145,"msg":"http: TLS handshake error from XX.XXX.39.206:49818: EOF"} │
│ 2022-11-10T11:58:06.216464114Z {"level":"error","ts":1668081486.2161825,"msg":"http: TLS handshake error from XX.XXX.39.206:49806: read tcp XX.XXX.181.107:9443->XX.XXX.39.206:49806: read: connection reset by peer"} │
│ 2022-11-10T11:59:21.996177659Z {"level":"error","ts":1668081561.9958243,"msg":"http: TLS handshake error from XX.XXX.39.206:37166: EOF"} │
│ 2022-11-10T12:00:00.218649715Z {"level":"error","ts":1668081600.218468,"msg":"http: TLS handshake error from XX.XXX.140.169:33712: EOF"} │
│ 2022-11-10T12:00:41.321180183Z {"level":"error","ts":1668081641.320983,"msg":"http: TLS handshake error from XX.XXX.39.206:48920: EOF"} │
│ 2022-11-10T12:01:39.304614334Z {"level":"error","ts":1668081699.3044322,"msg":"http: TLS handshake error from XX.XXX.39.206:52170: EOF"} │
│ 2022-11-10T12:05:48.388023648Z {"level":"error","ts":1668081948.3877704,"msg":"http: TLS handshake error from XX.XXX.39.206:46100: EOF"} │
│ 2022-11-10T12:06:47.723495059Z {"level":"error","ts":1668082007.7233665,"msg":"http: TLS handshake error from XX.XXX.39.206:60902: EOF"} │
│ 2022-11-10T12:07:37.665469165Z {"level":"error","ts":1668082057.6652558,"msg":"http: TLS handshake error from XX.XXX.39.206:55668: EOF"}
I checked the success of the HNC as following:
k get rolebindings -n myParentNS -n oneOfTheChildNamespaces
I see the desired rolebindings are there in all Namespaces.
So actually no problem in the result.
As I understand the error describes an timeout somewhere, so I fear when additional Namespaces appear that these new Namespaces may do not get the Changes HNC should apply.
Question:
How to solve this timeout? Can I increase somewhere the timeout?
Thx for suggestions or solutions in advance
The text was updated successfully, but these errors were encountered: