Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoruler deployment is not created in control plane cluster #149

Open
traghave123 opened this issue Jul 29, 2021 · 6 comments
Open

Autoruler deployment is not created in control plane cluster #149

traghave123 opened this issue Jul 29, 2021 · 6 comments

Comments

@traghave123
Copy link

Steps followed:

  1. Deploy control plane cluster with 3 master nodes using ztp playbooks
  2. Import control plane cluster into RHACM deployed in management cluster leo
  3. Add worker node to control plane cluster

We observed that csr is not getting auto approved

[taurus@taurus-kvm-server rhacm]$ oc get csr
NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   CONDITION
csr-lb47b   54m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-p4r9z   8m49s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-rj8fk   24m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-rx8pg   39m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-tq7wm   69m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending

While debugging we found that the pods are not created

[taurus@taurus-kvm-server rhacm]$ oc get pods -n node-autolabeler
No resources found in node-autolabeler namespace.

Which we think are responsible for auto approving csr.

Could you please help here?

@traghave123
Copy link
Author

traghave123 commented Jul 29, 2021

I've tried to create the below artifacts manually,

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  generation: 1
  labels:
    app: autoruler
  name: autoruler
  namespace: node-autolabeler
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: autoruler
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: autoruler
    spec:
      containers:
      - image: quay.io/karmab/autosigner:latest
        imagePullPolicy: Always
        name: autosigner
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - image: quay.io/karmab/autolabeller:latest
        imagePullPolicy: Always
        name: autolabeller
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
status: {}

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: autoruler-sa-role
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
  
---  
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: autoruler-sa-rolebinding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: autoruler-sa-role
subjects:
  - kind: ServiceAccount
    name: default
    namespace: node-autolabeler

This time it got approved automatically.

But we think these artifacts should be automatically created. Could you please let us know what we are missing in order to have these artifacts automatically in control plane cluster?

@yrobla
Copy link
Contributor

yrobla commented Aug 3, 2021

They should be created automatically, because they are part of the day 2 configuration. Can you share the repository and configuration that you are using to configure your clusters? Thanks.

@traghave123
Copy link
Author

Hi @yrobla

We found that there is issue in rhacm manifests generated, after fixing it the node-labeller pods are running.
Below is the repo/path we are using for creating rhacm policy.

https://github.com/traghave123/test-ran-manifests/tree/master/rhacm-manifests

However we are facing issue with csr auto approval randomly with below errors, could you please help

Incorrect group in csr csr-hgl9p. Ignoring
Signing server cert csr-zprlf
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/autosigner.py", line 96, in watch_csrs
    certs_api.replace_certificate_signing_request_approval(csr_name, body)
  File "/usr/lib/python3.7/site-packages/kubernetes/client/api/certificates_v1beta1_api.py", line 1439, in replace_certificate_signing_request_approval
    return self.replace_certificate_signing_request_approval_with_http_info(name, body, **kwargs)  # noqa: E501
  File "/usr/lib/python3.7/site-packages/kubernetes/client/api/certificates_v1beta1_api.py", line 1548, in replace_certificate_signing_request_approval_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 405, in request
    body=body)
  File "/usr/lib/python3.7/site-packages/kubernetes/client/rest.py", line 290, in PUT
    body=body)
  File "/usr/lib/python3.7/site-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'a1223e40-645e-4310-82e3-2623a948f2bb', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Warning': '299 - "certificates.k8s.io/v1beta1 CertificateSigningRequest is deprecated in v1.19+, unavailable in v1.22+; use certificates.k8s.io/v1 CertificateSigningRequest"', 'X-Kubernetes-Pf-Flowschema-Uid': '8045ad41-3b0e-4264-86e3-1f03a8185467', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b40c123a-2f0e-4194-856d-ae119ea2d75b', 'Date': 'Thu, 15 Jul 2021 03:18:40 GMT', 'Content-Length': '396'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Operation cannot be fulfilled on certificatesigningrequests.certificates.k8s.io \"csr-zprlf\": the object has been modified; please apply your changes to the latest version and try again","reason":"Conflict","details":{"name":"csr-zprlf","group":"certificates.k8s.io","kind":"certificatesigningrequests"},"code":409}

Also please find the attached file with full logs.
ErrorDuringCSRApproval.txt

@yrobla
Copy link
Contributor

yrobla commented Aug 5, 2021

Is this problem still happening? After getting some feedback, it seems that this problem is a temporary one but should disappear after autoapprover retries.

@traghave123
Copy link
Author

@yrobla Yeah, this happens randomly.
When a worker node is added the csr is not getting approved, when this situation occurs, we need to restart the autolabeller pod using below command.
oc delete pod autoruler-68f74b547c-gdwgh -n node-autolabeler
and the auto approval works.
Kindly help how can we avoid restart of pods, as you can see there are some errors in the logs of the pod.

@yrobla
Copy link
Contributor

yrobla commented Aug 5, 2021

A fix has been pushed to the autolabeler image. Please can you redeploy, ensuring that you have the latest images, and see if the problem is fixed? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants