Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read: connection reset by peer #236

Closed
MansurEsm opened this issue Nov 10, 2022 · 12 comments
Closed

read: connection reset by peer #236

MansurEsm opened this issue Nov 10, 2022 · 12 comments

Comments

@MansurEsm
Copy link

MansurEsm commented Nov 10, 2022

Hello,

I see errors in the log of the controller.
But the functionality seems still to work.
So my problem is mainly that errors are written in the logs.

Details:
Kubernetes EKS 1.23
At the moment we run 75 Namespaces in total (With two rolebindings each NS)
The Configuration of the deployment is this:

  • --webhook-server-port=9443
    - --metrics-addr=:8080
    - --max-reconciles=30
    - --apiserver-qps-throttle=30
    - --excluded-namespace=flux-system
    - --excluded-namespace=kube-system
    - --excluded-namespace=kube-public
    - --excluded-namespace=hnc-system
    - --excluded-namespace=kube-node-lease
    - --excluded-namespace=ingress-controller
    - --excluded-namespace=observability
    - --excluded-namespace=postgres
    - --excluded-namespace=rabbitmq
    - --enable-internal-cert-management
    - --cert-restart-on-secret-refresh
    - --included-namespace-regex=app-.*
    (I tried allready to adjust the trottle and reconcil time with no success)

The Log outputs a huge ammount of this kind of logs:
│ 2022-11-10T11:52:10.356933556Z {"level":"error","ts":1668081130.3564637,"msg":"http: TLS handshake error from XX.XXX.39.206:60406: EOF"} │
│ 2022-11-10T11:54:55.332004115Z {"level":"error","ts":1668081295.331847,"msg":"http: TLS handshake error from XX.XXX.39.206:46584: EOF"} │
│ 2022-11-10T11:55:33.518983284Z {"level":"error","ts":1668081333.5188336,"msg":"http: TLS handshake error from XX.XXX.39.206:52420: EOF"} │
│ 2022-11-10T11:56:07.074519506Z {"level":"error","ts":1668081367.0743582,"msg":"http: TLS handshake error from XX.XXX.39.206:34722: EOF"} │
│ 2022-11-10T11:58:05.740595523Z {"level":"error","ts":1668081485.7404268,"msg":"http: TLS handshake error from XX.XXX.39.206:49758: EOF"} │
│ 2022-11-10T11:58:05.795854044Z {"level":"error","ts":1668081485.7956684,"msg":"http: TLS handshake error from XX.XXX.39.206:49764: EOF"} │
│ 2022-11-10T11:58:06.018933577Z {"level":"error","ts":1668081486.0187643,"msg":"http: TLS handshake error from XX.XXX.39.206:49786: EOF"} │
│ 2022-11-10T11:58:06.118885718Z {"level":"error","ts":1668081486.118721,"msg":"http: TLS handshake error from XX.XXX.39.206:49792: EOF"} │
│ 2022-11-10T11:58:06.216421670Z {"level":"error","ts":1668081486.2161145,"msg":"http: TLS handshake error from XX.XXX.39.206:49818: EOF"} │
│ 2022-11-10T11:58:06.216464114Z {"level":"error","ts":1668081486.2161825,"msg":"http: TLS handshake error from XX.XXX.39.206:49806: read tcp XX.XXX.181.107:9443->XX.XXX.39.206:49806: read: connection reset by peer"} │
│ 2022-11-10T11:59:21.996177659Z {"level":"error","ts":1668081561.9958243,"msg":"http: TLS handshake error from XX.XXX.39.206:37166: EOF"} │
│ 2022-11-10T12:00:00.218649715Z {"level":"error","ts":1668081600.218468,"msg":"http: TLS handshake error from XX.XXX.140.169:33712: EOF"} │
│ 2022-11-10T12:00:41.321180183Z {"level":"error","ts":1668081641.320983,"msg":"http: TLS handshake error from XX.XXX.39.206:48920: EOF"} │
│ 2022-11-10T12:01:39.304614334Z {"level":"error","ts":1668081699.3044322,"msg":"http: TLS handshake error from XX.XXX.39.206:52170: EOF"} │
│ 2022-11-10T12:05:48.388023648Z {"level":"error","ts":1668081948.3877704,"msg":"http: TLS handshake error from XX.XXX.39.206:46100: EOF"} │
│ 2022-11-10T12:06:47.723495059Z {"level":"error","ts":1668082007.7233665,"msg":"http: TLS handshake error from XX.XXX.39.206:60902: EOF"} │
│ 2022-11-10T12:07:37.665469165Z {"level":"error","ts":1668082057.6652558,"msg":"http: TLS handshake error from XX.XXX.39.206:55668: EOF"}

I checked the success of the HNC as following:
k get rolebindings -n myParentNS -n oneOfTheChildNamespaces

I see the desired rolebindings are there in all Namespaces.
So actually no problem in the result.

As I understand the error describes an timeout somewhere, so I fear when additional Namespaces appear that these new Namespaces may do not get the Changes HNC should apply.

Question:
How to solve this timeout? Can I increase somewhere the timeout?

Thx for suggestions or solutions in advance

@adrianludwin
Copy link
Contributor

I've seen the occasional error like this reported but I've never been able to reproduce it. Do you know what's at XX.XXX.39.206?

@MansurEsm
Copy link
Author

MansurEsm commented Nov 14, 2022

Yes. I have just reducted the IP Address.
Sample:
{"level":"error","ts":1668430451.4085166,"msg":"http: TLS handshake error from 10.160.39.206:56810: EOF"} {"level":"error","ts":1668430504.1096373,"msg":"http: TLS handshake error from 10.160.39.206:34218: EOF"}
2022-11-14T14:11:26.594074106Z {"level":"error","ts":1668435086.5939658,"msg":"http: TLS handshake error from 10.160.140.169:52866: EOF"}

The error appears often. Like every 2-5 Minute
I dont understand what IP Address it is. Its Not the Nodes and its not any pod.

@adrianludwin
Copy link
Contributor

Could it be the control plane (i.e. masters / apiserver)?

@adrianludwin
Copy link
Contributor

Prior version (#49) was also on EKS.

A similar problem on a different project is on Azure: kubernetes-sigs/cluster-api-provider-azure#428

Have you seen any log messages like x509: certificate signed by unknown authority? Maybe in the apiserver logs (see that second issue for details)?

@adrianludwin
Copy link
Contributor

I wonder if it has something to do with the webhooks. If you try to do something illegal - e.g., deleting a propagated object - does it go through or do you get an error? The webhooks are only there to stop you from doing the wrong thing, so if you only use HNC correct, everything would appear to work.

Are you using internal cert management (the default) or something like cert-manager?

@MansurEsm
Copy link
Author

Hi,

Have you seen any log messages like x509: certificate signed by unknown authority? Maybe in the apiserver logs (see that second issue for details)?
Are you using internal cert management (the default) or something like cert-manager?

No certificate errors. No cert-manager.
I use this certificate configuration:

  • --enable-internal-cert-management
  • --cert-restart-on-secret-refresh

I also suspect the webhooks. I remove them first of all and see whats happening.
Actually I did'mt saw an error doing smth illegal. But it's hard to test.

Another suspect is the AWS securitygroup setting. I will check this also.
I'll come back

@MansurEsm
Copy link
Author

MansurEsm commented Nov 15, 2022

Feedback:

Deleting the webhooks:

kind: MutatingWebhookConfiguration
name: namespacelabel.hnc.x-k8s.io
and
kind: ValidatingWebhookConfiguration
name: subnamespaceanchors.hnc.x-k8s.io

... made that the TLS handshake Error disapeared.

The configuration of them was:
`apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
creationTimestamp: null
name: hnc-mutating-webhook-configuration
webhooks:

  • admissionReviewVersions:
    • v1
      clientConfig:
      service:
      name: hnc-webhook-service
      namespace: hnc-system
      path: /mutate-namespace
      failurePolicy: Ignore
      name: namespacelabel.hnc.x-k8s.io
      rules:
    • apiGroups:
      • ""
        apiVersions:
      • v1
        operations:
      • CREATE
      • UPDATE
        resources:
      • namespaces
        sideEffects: None

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: hnc-validating-webhook-configuration
webhooks:

  • admissionReviewVersions:
    • v1
      clientConfig:
      service:
      name: hnc-webhook-service
      namespace: hnc-system
      path: /validate-hnc-x-k8s-io-v1alpha2-subnamespaceanchors
      failurePolicy: Fail
      name: subnamespaceanchors.hnc.x-k8s.io
      rules:
    • apiGroups:
      • hnc.x-k8s.io
        apiVersions:
      • v1alpha2
        operations:
      • CREATE
      • DELETE
        resources:
      • subnamespaceanchors
        sideEffects: None
  • admissionReviewVersions:
    • v1
      clientConfig:
      service:
      name: hnc-webhook-service
      namespace: hnc-system
      path: /validate-hnc-x-k8s-io-v1alpha2-hierarchyconfigurations
      failurePolicy: Fail
      name: hierarchyconfigurations.hnc.x-k8s.io
      rules:
    • apiGroups:
      • hnc.x-k8s.io
        apiVersions:
      • v1alpha2
        operations:
      • CREATE
      • UPDATE
        resources:
      • hierarchyconfigurations
        sideEffects: None
  • admissionReviewVersions:
    • v1
      clientConfig:
      service:
      name: hnc-webhook-service
      namespace: hnc-system
      path: /validate-objects
      failurePolicy: Fail
      name: objects.hnc.x-k8s.io
      namespaceSelector:
      matchLabels:
      hnc.x-k8s.io/included-namespace: "true"
      rules:
    • apiGroups:
      • "*"
        apiVersions:
      • "*"
        operations:
      • CREATE
      • UPDATE
      • DELETE
        resources:
      • "*"
        scope: Namespaced
        sideEffects: None
        timeoutSeconds: 4
  • admissionReviewVersions:
    • v1
      clientConfig:
      service:
      name: hnc-webhook-service
      namespace: hnc-system
      path: /validate-hnc-x-k8s-io-v1alpha2-hncconfigurations
      failurePolicy: Fail
      name: hncconfigurations.hnc.x-k8s.io
      rules:
    • apiGroups:
      • hnc.x-k8s.io
        apiVersions:
      • v1alpha2
        operations:
      • CREATE
      • UPDATE
      • DELETE
        resources:
      • hncconfigurations
        sideEffects: None
  • admissionReviewVersions:
    • v1
      clientConfig:
      service:
      name: hnc-webhook-service
      namespace: hnc-system
      path: /validate-v1-namespace
      failurePolicy: Fail
      name: namespaces.hnc.x-k8s.io
      rules:
    • apiGroups:
      • ""
        apiVersions:
      • v1
        operations:
      • DELETE
      • CREATE
      • UPDATE
        resources:
      • namespaces
        sideEffects: None`

Do you have a hint what can cause the issue? So I can investigate further.
I actually would prefere to keep them.

What I do in the background:

  1. I create (In another Repo - terraform) namespaces
  2. for_each Namespace assign k8 role_bindings (and cluster_role)

@erikgb
Copy link
Contributor

erikgb commented Nov 15, 2022

One option is to try switching to cert-manager. I've had numerous issues with the cert-rotator.

@adrianludwin
Copy link
Contributor

adrianludwin commented Nov 15, 2022 via email

@mochizuki875
Copy link
Member

mochizuki875 commented Jan 5, 2023

I've came across the same issue on kind.
I'm not sure, but it seems to be related to net/http.

The similar issues can be found in other projects.
kubernetes/kubernetes#109022
open-policy-agent/gatekeeper#2142

@adrianludwin
Copy link
Contributor

Thanks for that @mochizuki875 . Unfortunately the problem seems to be coming from K8s itself (kubernetes/kubernetes#109022) so there's nothing we can do here.

/close

@k8s-ci-robot
Copy link
Contributor

@adrianludwin: Closing this issue.

In response to this:

Thanks for that @mochizuki875 . Unfortunately the problem seems to be coming from K8s itself (kubernetes/kubernetes#109022) so there's nothing we can do here.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants