Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar injection fails when Operator ready after target Pods #1765

Open
dacamposol opened this issue May 24, 2023 · 14 comments
Open

Sidecar injection fails when Operator ready after target Pods #1765

dacamposol opened this issue May 24, 2023 · 14 comments
Labels
area:collector Issues for deploying collector bug Something isn't working

Comments

@dacamposol
Copy link

Hello everyone,

First of all, thank you for the effort put into the Operator, as it is a really useful tool to enhance the instrumentation in Kubernetes environments.

I am currently using the OpenTelemetryCollector as a sidecar, with the following configuration:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: sidecar-collector
spec:
  mode: sidecar
  envFrom:
    - secretRef:
        name: {{ .Values.telemetry.secretRef }}
  securityContext:
    runAsNonRoot: true
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"

    processors:

    extensions:
      health_check: {}

    exporters:
      otlp:
        endpoint: "${env:traces-endpoint}:${env:traces-port}"
        headers:
          Authorization: "${env:traces-auth-header}"

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [otlp]

The target Deployments have the annotation sidecar.opentelemetry.io/inject: "true" in the .template.spec, and overall it works without any issues. However, I seem to have a racing condition where if the Deployment initializes the Pods before the OpenTelemetry Operator is ready, then the target Pods will never have the sidecar injected. This issue only occurs when I wake up my cluster after hibernation.

I noticed that the Istio Operator recreates the Pod once it's ready, correctly injecting the envoy proxies as sidecar in all the Pods with the corresponding annotation.

Is there any way to tell the OpenTelemetry Operator to recreate any Pod which doesn't have the sidecar already injected when it analyzes the different Pods and Namespaces on startup in search of the inject annotation?

@dacamposol dacamposol changed the title Sidecar not being injected if Operator deployed after target Deployment Sidecar injection fails when Operator ready after target Pods May 24, 2023
@TylerHelmuth
Copy link
Member

I did a quick look through the sidecar injection logic and I didn't see anything that stuck out at me as to why this wouldn't be working as you expected.

Does the OpenTelemetryCollector object exist already when you bring the cluster out of hibernation? What do the operator logs show? I am hoping there is a failed to select an OpenTelemetry Collector instance for this pod's sidecar log in there somewhere when this race condition happens.

@TylerHelmuth TylerHelmuth added bug Something isn't working area:collector Issues for deploying collector labels May 24, 2023
@dacamposol
Copy link
Author

I did a quick look through the sidecar injection logic and I didn't see anything that stuck out at me as to why this wouldn't be working as you expected.

Does the OpenTelemetryCollector object exist already when you bring the cluster out of hibernation? What do the operator logs show? I am hoping there is a failed to select an OpenTelemetry Collector instance for this pod's sidecar log in there somewhere when this race condition happens.

Hello Tyler, thank you for your quick response.

The OpenTelemetryCollector object gets deployed at the same time than the operator itself, and it should exist already when we are out of hibernation.

I have one Argo Application, let's call it Application A, that is an umbrella Chart with dependency on the chart for the Operator and some extra templates, which includes the secret for the environment defined in the OpenTelemetryCollector resource. Additionally, I have another Argo Application, which is just a compilation of the required resources for deploying my application (a Service, a Deployment, some RBAC resources, etc...)

I don't see any error on the logs of the operator logs:

{"level":"info","ts":"2023-05-24T04:16:00Z","msg":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.76.1","opentelemetry-collector":"otel/opentelemetry-collector-contrib:0.76.1","opentelemetry-targetallocator":"ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.76.1","operator-opamp-bridge":"ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:0.76.1","auto-instrumentation-java":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.25.1","auto-instrumentation-nodejs":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.38.0","auto-instrumentation-python":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.38b0","auto-instrumentation-dotnet":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.7.0","feature-gates":"operator.autoinstrumentation.dotnet,operator.autoinstrumentation.java,operator.autoinstrumentation.nodejs,operator.autoinstrumentation.python,-operator.collector.rewritetargetallocator","build-date":"2023-05-09T13:57:45Z","go-version":"go1.20.4","go-arch":"amd64","go-os":"linux","labels-filter":[]}
{"level":"info","ts":"2023-05-24T04:16:00Z","logger":"setup","msg":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
I0524 04:16:01.754669       1 request.go:690] Waited for 1.047851876s due to client-side throttling, not priority and fairness, request: GET:https://api.d1eu1.dxp-d1.internal.canary.k8s.ondemand.com:443/apis/sql.cnrm.cloud.google.com/v1beta1?timeout=32s
{"level":"info","ts":"2023-05-24T04:16:05Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":"0.0.0.0:8080"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpenTelemetryCollector","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-opentelemetrycollector"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook.webhooks","msg":"Starting webhook server"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.webhook","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":"2023-05-24T04:16:06Z","msg":"Starting server","path":"/metrics","kind":"metrics","addr":"[::]:8080"}
{"level":"info","ts":"2023-05-24T04:16:06Z","msg":"Starting server","kind":"health probe","addr":"[::]:8081"}
{"level":"info","ts":"2023-05-24T04:16:06Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
I0524 04:16:06.066356       1 leaderelection.go:248] attempting to acquire leader lease monitoring/9f7554c3.opentelemetry.io...
I0524 04:19:14.429811       1 leaderelection.go:258] successfully acquired lease monitoring/9f7554c3.opentelemetry.io
{"level":"info","ts":"2023-05-24T04:19:14Z","logger":"instrumentation-upgrade","msg":"looking for managed Instrumentation instances to upgrade"}
{"level":"info","ts":"2023-05-24T04:19:14Z","logger":"collector-upgrade","msg":"looking for managed instances to upgrade"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1alpha1.OpenTelemetryCollector"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
{"level":"info","ts":"2023-05-24T04:19:14Z","msg":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"info","ts":"2023-05-24T04:19:14Z","logger":"instrumentation-upgrade","msg":"no instances to upgrade"}
{"level":"info","ts":"2023-05-24T04:19:14Z","logger":"collector-upgrade","msg":"skipping upgrade for OpenTelemetry Collector instance, as it's newer than our latest version","name":"sidecar-collector","namespace":"dxp-system","version":"0.76.1","latest":"0.61.0"}
{"level":"info","ts":"2023-05-24T04:19:17Z","msg":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}

When I checked the status of my Pod in ArgoCD (after some hours), I noticed that there were only 2 containers within (the container with the application and the envoy proxy). Once I deleted manually the Pod, it got recreated without any issue, and the sidecar was properly injected, even though there were some error messages in the operator logs:

{"level":"info","ts":"2023-05-24T13:23:18Z","msg":"couldn't determine metrics port from configuration, using 8888 default value","error":"missing port in address"}
{"level":"error","ts":"2023-05-24T13:23:18Z","msg":"Cannot create liveness probe.","error":"service property in the configuration doesn't contain extensions","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/collector.Container\n\t/workspace/pkg/collector/container.go:127\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.add\n\t/workspace/pkg/sidecar/pod.go:43\ngithub.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/workspace/pkg/sidecar/podmutator.go:100\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhookhandler.(*podSidecarInjector).Handle\n\t/workspace/internal/webhookhandler/webhookhandler.go:92\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:169\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:98\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:146\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:108\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2500\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}

If you say that there isn't any logic which prevents "old" Pods to be restarted upon readiness of the operator, the only other thing that I could think of, is that there is a dependency on a secret in order to populate the environment of the sidecar... Maybe the Secret wasn't ready at that point and instead of failing the deployment, it just skipped to inject the sidecar and didn't try anymore?

@TylerHelmuth
Copy link
Member

I'm not sure, this is gonna take more digging into.

@dacamposol
Copy link
Author

dacamposol commented May 24, 2023

@TylerHelmuth regarding my last proposition, it's not that: I can verify there is no problem with the Secret not being ready.

I performed a test where I deployed a different OpenTelemetryCollector in an isolated namespace, referring to a non-existent Secret.

When I deployed the following example Pod:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: telemetry-poc
  annotations:
    sidecar.opentelemetry.io/inject: "true"
spec:
  containers:
  - image: busybox
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
    name: busybox
  restartPolicy: Always

The collector was successfully injected as a sidecar, but it just returns a CreateContainerConfigError since it cannot find the referred secret, but that's expected.

My problem is that the sidecar doesn't even get injected. I'm going to perform further testing.


Additional Info:

I scaled down the replicas of the operator to zero, and I redeployed the aforementioned busybox Pod. Once the Pod was ready (1/1 containers running), I scaled up the operator:

The Pod, even having the correct annotation, doesn't get recreated and the sidecar is never injected. The sidecar is only injected once I recreate the Pod.

I'm using the following image of the operator:

ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:v0.76.1

@dacamposol
Copy link
Author

dacamposol commented May 25, 2023

@TylerHelmuth I found the issue.

Expected Behaviour: Istio Example

Istio prevents the Pods of being created before the admission webhook is ready through a fail-close configuration in the MutatingWebhookConfiguration, where they specify the .failurePolicy: Fail. If the istio-injector-sidecar isn't ready, the the Pod won't be created1:

Internal error occurred: failed calling admission webhook "istio-sidecar-injector.istio.io": \
    Post https://istio-sidecar-injector.istio-system.svc:443/admitPilot?timeout=30s: \
    no endpoints available for service "istio-sidecar-injector"

Current Behaviour

When we deploy the opentelemetry-operator Chart, there is the .admissionWebhooks.failurePolicy value, which by default is set to Fail, but the problem is that is not being taken into account for Pod validation. No matter what we configure, the Chart will set the .failurePolicy to Ignore.

From my personal point-of-view, I'd expect the user who deploys the Chart to have full decisional-capability on the failurePolicy of the webhooks, but I didn't create a PR yet since I'm not aware if that was some kind of architectural decision from your side.

If not, please let me know and I'll be glad to create a fix.

Footnotes

  1. See https://istio.io/latest/docs/ops/common-problems/injection/#no-such-hosts-or-no-endpoints-available-errors-in-deployment-status

@TylerHelmuth
Copy link
Member

TylerHelmuth commented May 25, 2023

It's always istio lol good find

@winston0410
Copy link

winston0410 commented Jun 10, 2023

Hi I am experiencing this issue as well, and I am not using Istio. I have tried to set admissionWebhooks.pods.failurePolicy to Fail, but then the operator will never start. Is this issue simply because of the order of applying?

This is the value of my chart:

helmCharts:
  - name: opentelemetry-operator
    namespace: opentelemetry
    releaseName: opentelemetry-operator
    includeCRDs: true
    version: 0.31.0
    repo: https://open-telemetry.github.io/opentelemetry-helm-charts
    valuesInline:
      manager:
        serviceMonitor:
          enabled: true
        prometheusRule:
          enabled: true
          defaultRules:
            enabled: true
      kubeRBACProxy:
        enabled: false
      admissionWebhooks:
        pods:
          failurePolicy: Ignore
        certManager:
          enabled: true
          create: true
        autoGenerateCert: false

@dacamposol
Copy link
Author

dacamposol commented Jun 11, 2023

@winston0410 that's weird, except if you marked the operator itself to be injected with a collector sidecar (either by the annotation in the Namespace or in the Pod of the operator itself). Since the logic is part of a MutatingWebhookConfiguration, it could be that what's preventing the operator to start.

Out of curiosity, and for the sake of testing it out, could you provide here the manifest of the operator's Pod and Namespace?

@winston0410
Copy link

Sure, this is the generated manifests excluding CRDs:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator
  namespace: opentelemetry
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-leader-election
  namespace: opentelemetry
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
  - delete
- apiGroups:
  - ""
  resources:
  - configmaps/status
  verbs:
  - get
  - update
  - patch
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-manager
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - patch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - serviceaccounts
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  resources:
  - daemonsets
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - apps
  resources:
  - replicasets
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - create
  - get
  - list
  - update
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - opentelemetry.io
  resources:
  - instrumentations
  verbs:
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - opentelemetry.io
  resources:
  - opentelemetrycollectors
  verbs:
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - opentelemetry.io
  resources:
  - opentelemetrycollectors/finalizers
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - opentelemetry.io
  resources:
  - opentelemetrycollectors/status
  verbs:
  - get
  - patch
  - update
- apiGroups:
  - route.openshift.io
  resources:
  - routes
  verbs:
  - create
  - delete
  - get
  - list
  - patch
  - update
  - watch
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-leader-election
  namespace: opentelemetry
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: opentelemetry-operator-leader-election
subjects:
- kind: ServiceAccount
  name: opentelemetry-operator
  namespace: opentelemetry
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: opentelemetry-operator-manager
subjects:
- kind: ServiceAccount
  name: opentelemetry-operator
  namespace: opentelemetry
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator
  namespace: opentelemetry
spec:
  ports:
  - name: metrics
    port: 8080
    protocol: TCP
    targetPort: metrics
  selector:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/name: opentelemetry-operator
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-webhook
  namespace: opentelemetry
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: webhook-server
  selector:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/name: opentelemetry-operator
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator
  namespace: opentelemetry
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: controller-manager
      app.kubernetes.io/name: opentelemetry-operator
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/default-container: manager
      labels:
        app.kubernetes.io/component: controller-manager
        app.kubernetes.io/name: opentelemetry-operator
    spec:
      containers:
      - args:
        - --metrics-addr=0.0.0.0:8080
        - --enable-leader-election
        - --health-probe-addr=:8081
        - --webhook-port=9443
        - --collector-image=otel/opentelemetry-collector-contrib:0.78.0
        command:
        - /manager
        env:
        - name: ENABLE_WEBHOOKS
          value: "true"
        image: ghcr.io/open-telemetry/opentelemetry-operator/opentelemetry-operator:v0.78.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8081
          initialDelaySeconds: 15
          periodSeconds: 20
        name: manager
        ports:
        - containerPort: 8080
          name: metrics
          protocol: TCP
        - containerPort: 9443
          name: webhook-server
          protocol: TCP
        readinessProbe:
          httpGet:
            path: /readyz
            port: 8081
          initialDelaySeconds: 5
          periodSeconds: 10
        resources:
          limits:
            cpu: 100m
            memory: 128Mi
          requests:
            cpu: 100m
            memory: 64Mi
        volumeMounts:
        - mountPath: /tmp/k8s-webhook-server/serving-certs
          name: cert
          readOnly: true
      hostNetwork: false
      securityContext:
        fsGroup: 65532
        runAsGroup: 65532
        runAsNonRoot: true
        runAsUser: 65532
      serviceAccountName: opentelemetry-operator
      terminationGracePeriodSeconds: 10
      volumes:
      - name: cert
        secret:
          defaultMode: 420
          secretName: opentelemetry-operator-controller-manager-service-cert
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  labels:
    app.kubernetes.io/component: webhook
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-serving-cert
  namespace: opentelemetry
spec:
  dnsNames:
  - opentelemetry-operator-webhook.opentelemetry.svc
  - opentelemetry-operator-webhook.opentelemetry.svc.cluster.local
  issuerRef:
    kind: Issuer
    name: opentelemetry-operator-selfsigned-issuer
  secretName: opentelemetry-operator-controller-manager-service-cert
  subject:
    organizationalUnits:
    - opentelemetry-operator
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  labels:
    app.kubernetes.io/component: webhook
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-selfsigned-issuer
  namespace: opentelemetry
spec:
  selfSigned: {}
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator
  namespace: opentelemetry
spec:
  groups:
  - name: managerRules
    rules:
    - alert: ReconcileErrors
      annotations:
        description: 'Reconciliation errors for {{ $labels.controller }} is increasing
          and has now reached {{ humanize $value }} '
        runbook_url: Check manager logs for reasons why this might happen
      expr: rate(controller_runtime_reconcile_total{result="error"}[5m]) > 0
      for: 5m
      labels:
        severity: warning
    - alert: WorkqueueDepth
      annotations:
        description: 'Queue depth for {{ $labels.name }} has reached {{ $value }} '
        runbook_url: Check manager logs for reasons why this might happen
      expr: workqueue_depth > 0
      for: 5m
      labels:
        severity: warning
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator
  namespace: opentelemetry
spec:
  endpoints:
  - port: metrics
  namespaceSelector:
    matchNames:
    - opentelemetry
  selector:
    matchLabels:
      app.kubernetes.io/component: controller-manager
      app.kubernetes.io/name: opentelemetry-operator
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    helm.sh/hook: test
  labels:
    app.kubernetes.io/component: webhook
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-cert-manager
  namespace: opentelemetry
spec:
  containers:
  - command:
    - sh
    - -c
    - |
      wget_output=$(wget -q "$CERT_MANAGER_CLUSTERIP:$CERT_MANAGER_PORT")
      if wget_output=="wget: server returned error: HTTP/1.0 400 Bad Request"
      then exit 0
      else exit 1
      fi
    env:
    - name: CERT_MANAGER_CLUSTERIP
      value: cert-manager-webhook
    - name: CERT_MANAGER_PORT
      value: "443"
    image: busybox:latest
    name: wget
  restartPolicy: Never
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    helm.sh/hook: test
  labels:
    app.kubernetes.io/component: controller-manager
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-webhook
  namespace: opentelemetry
spec:
  containers:
  - command:
    - sh
    - -c
    - |
      wget_output=$(wget -q "$WEBHOOK_SERVICE_CLUSTERIP:$WEBHOOK_SERVICE_PORT")
      if wget_output=="wget: server returned error: HTTP/1.0 400 Bad Request"
      then exit 0
      else exit 1
      fi
    env:
    - name: WEBHOOK_SERVICE_CLUSTERIP
      value: opentelemetry-operator-webhook
    - name: WEBHOOK_SERVICE_PORT
      value: "443"
    image: busybox:latest
    name: wget
  restartPolicy: Never
---
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
    cert-manager.io/inject-ca-from: opentelemetry/opentelemetry-operator-serving-cert
  labels:
    app.kubernetes.io/component: webhook
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-mutation
webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: opentelemetry-operator-webhook
      namespace: opentelemetry
      path: /mutate-opentelemetry-io-v1alpha1-instrumentation
  failurePolicy: Fail
  name: minstrumentation.kb.io
  rules:
  - apiGroups:
    - opentelemetry.io
    apiVersions:
    - v1alpha1
    operations:
    - CREATE
    - UPDATE
    resources:
    - instrumentations
  sideEffects: None
  timeoutSeconds: 10
- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: opentelemetry-operator-webhook
      namespace: opentelemetry
      path: /mutate-opentelemetry-io-v1alpha1-opentelemetrycollector
  failurePolicy: Fail
  name: mopentelemetrycollector.kb.io
  rules:
  - apiGroups:
    - opentelemetry.io
    apiVersions:
    - v1alpha1
    operations:
    - CREATE
    - UPDATE
    resources:
    - opentelemetrycollectors
  sideEffects: None
  timeoutSeconds: 10
- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: opentelemetry-operator-webhook
      namespace: opentelemetry
      path: /mutate-v1-pod
  failurePolicy: Ignore
  name: mpod.kb.io
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    - UPDATE
    resources:
    - pods
  sideEffects: None
  timeoutSeconds: 10
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  annotations:
    cert-manager.io/inject-ca-from: opentelemetry/opentelemetry-operator-serving-cert
  labels:
    app.kubernetes.io/component: webhook
    app.kubernetes.io/instance: opentelemetry-operator
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: opentelemetry-operator
    app.kubernetes.io/version: 0.78.0
    helm.sh/chart: opentelemetry-operator-0.31.0
  name: opentelemetry-operator-validation
webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: opentelemetry-operator-webhook
      namespace: opentelemetry
      path: /validate-opentelemetry-io-v1alpha1-instrumentation
  failurePolicy: Fail
  name: vinstrumentationcreateupdate.kb.io
  rules:
  - apiGroups:
    - opentelemetry.io
    apiVersions:
    - v1alpha1
    operations:
    - CREATE
    - UPDATE
    resources:
    - instrumentations
  sideEffects: None
  timeoutSeconds: 10
- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: opentelemetry-operator-webhook
      namespace: opentelemetry
      path: /validate-opentelemetry-io-v1alpha1-instrumentation
  failurePolicy: Ignore
  name: vinstrumentationdelete.kb.io
  rules:
  - apiGroups:
    - opentelemetry.io
    apiVersions:
    - v1alpha1
    operations:
    - DELETE
    resources:
    - instrumentations
  sideEffects: None
  timeoutSeconds: 10
- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: opentelemetry-operator-webhook
      namespace: opentelemetry
      path: /validate-opentelemetry-io-v1alpha1-opentelemetrycollector
  failurePolicy: Fail
  name: vopentelemetrycollectorcreateupdate.kb.io
  rules:
  - apiGroups:
    - opentelemetry.io
    apiVersions:
    - v1alpha1
    operations:
    - CREATE
    - UPDATE
    resources:
    - opentelemetrycollectors
  sideEffects: None
  timeoutSeconds: 10
- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: opentelemetry-operator-webhook
      namespace: opentelemetry
      path: /validate-opentelemetry-io-v1alpha1-opentelemetrycollector
  failurePolicy: Ignore
  name: vopentelemetrycollectordelete.kb.io
  rules:
  - apiGroups:
    - opentelemetry.io
    apiVersions:
    - v1alpha1
    operations:
    - DELETE
    resources:
    - opentelemetrycollectors
  sideEffects: None
  timeoutSeconds: 10

The helmchart manifest

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: opentelemetry

resources:
  - ./resources/index.yaml # this simply create the namespace

helmCharts:
  - name: opentelemetry-operator
    namespace: opentelemetry
    releaseName: opentelemetry-operator
    includeCRDs: true
    version: 0.31.0
    repo: https://open-telemetry.github.io/opentelemetry-helm-charts
    valuesInline:
      manager:
        serviceMonitor:
          enabled: true
        prometheusRule:
          enabled: true
          defaultRules:
            enabled: true
      kubeRBACProxy:
        enabled: false
      admissionWebhooks:
        pods:
          # REF https://github.com/open-telemetry/opentelemetry-operator/issues/1765
          failurePolicy: Ignore
        certManager:
          enabled: true
          create: true
        autoGenerateCert: false

@yuriolisa
Copy link
Contributor

Hi @dacamposol, are you willing to send a fix for that issue?

@dacamposol
Copy link
Author

@yuriolisa I'm investigating what could be the reason of the Operator not starting up normally when the injection isn't there.

I'll update with my findings.

@dacamposol
Copy link
Author

@winston0410 I found the problem, it's just a misconfiguration on your files.

I noticed, that you're using the default MutatingWebhookConfiguration, without setting a proper .objectSelector on it.

Concretely, the problem is in the following section:

- admissionReviewVersions:
  - v1
  clientConfig:
    service:
      name: opentelemetry-operator-webhook
      namespace: opentelemetry
      path: /mutate-v1-pod
  failurePolicy: Ignore
  name: mpod.kb.io
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    - UPDATE
    resources:
    - pods
  sideEffects: None
  timeoutSeconds: 10

As you can see, the Mutation operation is trying to happen in all the Pods in the system, not only in the ones you want to actually inject the sidecar.

I don't know how it works with Kustomize, but in Helm you would just have to set the values of the .admissionWebhooks with a proper .objectSelector.

For example, let's say that you're deploying your OpenTelemetry Operator in the monitoring namespace, then you'd have to add something like:

admissionWebhooks:
  namespaceSelector:
    matchExpressions:
      - key: kubernetes.io/metadata.name
       operator: NotIn
       values:
        - kube-system
        - monitoring

In this way, you prevent resources of the kube-system namespace to wait for the admission webhook, and your operator itself as well.


Anyway, @yuriolisa, I don't have much time right now, but I'd like to propose the possibility of adding different objectSelector to the different resources in the Chart. While it's not directly related to this issue, I think it would offer a lot of flexibility that we can define an objectSelector for the pods different than the one for the opentelemetrycollector resources.

If not implemented, I'll try to do it in the future.

@jaronoff97
Copy link
Contributor

I think this is a flavor of #1329

@csyyy106
Copy link

{"level":"ERROR","timestamp":"2024-11-10T03:17:43.458903251Z","message":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/sidecar/podmutator.go:84\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhook/podmutation.(*podMutationWebhook).Handle\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/webhook/podmutation/webhookhandler.go:93\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:181\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\nnet/http.(*ServeMux).ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2688\nnet/http.serverHandler.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:3142\nnet/http.(*conn).serve\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2044"}
{"level":"ERROR","timestamp":"2024-11-10T03:17:47.414812688Z","message":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/sidecar/podmutator.go:84\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhook/podmutation.(*podMutationWebhook).Handle\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/webhook/podmutation/webhookhandler.go:93\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:181\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\nnet/http.(*ServeMux).ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2688\nnet/http.serverHandler.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:3142\nnet/http.(*conn).serve\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2044"}
{"level":"ERROR","timestamp":"2024-11-10T03:17:49.759572162Z","message":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/sidecar/podmutator.go:84\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhook/podmutation.(*podMutationWebhook).Handle\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/webhook/podmutation/webhookhandler.go:93\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:181\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\nnet/http.(*ServeMux).ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2688\nnet/http.serverHandler.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:3142\nnet/http.(*conn).serve\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2044"}
{"level":"INFO","timestamp":"2024-11-10T03:22:23.832875828Z","logger":"controllers.OpenTelemetryCollector","message":"pdb field is unset in Spec, creating default"}
{"level":"ERROR","timestamp":"2024-11-10T04:02:28.930318531Z","message":"failed to select an OpenTelemetry Collector instance for this pod's sidecar","namespace":"test","name":"","error":"no OpenTelemetry Collector instances available","stacktrace":"github.com/open-telemetry/opentelemetry-operator/pkg/sidecar.(*sidecarPodMutator).Mutate\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/pkg/sidecar/podmutator.go:84\ngithub.com/open-telemetry/opentelemetry-operator/internal/webhook/podmutation.(*podMutationWebhook).Handle\n\t/home/runner/work/opentelemetry-operator/opentelemetry-operator/internal/webhook/podmutation/webhookhandler.go:93\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/webhook.go:181\nsigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP\n\t/home/runner/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/webhook/admission/http.go:119\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/metrics.InstrumentedHook.InstrumentHandlerInFlight.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:60\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:147\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2\n\t/home/runner/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:109\nnet/http.HandlerFunc.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2171\nnet/http.(*ServeMux).ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2688\nnet/http.serverHandler.ServeHTTP\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:3142\nnet/http.(*conn).serve\n\t/opt/hostedtoolcache/go/1.22.7/x64/src/net/http/server.go:2044"}
Logs from 2024年11月9日 to 2024年11月10日 UTC
image

OpenTelemetryCollector I clearly have this resource, why is it still an error? Can't find this resource?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants