Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability Kubernetes Onboarding doesn't ship data #5613

Open
flash1293 opened this issue Sep 25, 2024 · 5 comments
Open

Observability Kubernetes Onboarding doesn't ship data #5613

flash1293 opened this issue Sep 25, 2024 · 5 comments
Labels
bug Something isn't working Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Comments

@flash1293
Copy link

flash1293 commented Sep 25, 2024

Following the Kubernetes onboarding flow on serverless (Add data > Monitor Infrastructure > Kubernetes) doesn't ship data. This can be reproduced on a serverless observability project and was tested with minikube running on Mac.

The logs show lots of errors like this:

{"log.level":"error","@timestamp":"2024-09-25T08:15:35.683Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":665},"message":"Unit state changed kubernetes/metrics-default-kubernetes-node-metrics-kubernetes-bcdc9c26-d274-4db3-95e0-0bb396fdd402 (STARTING->FAILED): Failed: pid '295' exited with code '-1'","log":{"source":"elastic-agent"},"component":{"id":"kubernetes/metrics-default","state":"FAILED"},"unit":{"id":"kubernetes/metrics-default-kubernetes-node-metrics-kubernetes-bcdc9c26-d274-4db3-95e0-0bb396fdd402","type":"input","state":"FAILED","old_state":"STARTING"},"ecs.version":"1.6.0"}

It's possible this is a problem on the Kibana side in the flow as well, starting here for troubleshooting and we can move the issue in case it's unrelated.

A suspicion is that this is related to resourcing and the agent now needs more memory, but this needs to be confirmed.

@flash1293 flash1293 added bug Something isn't working Team:obs-ds-hosted-services Label for the Observability Hosted Services team labels Sep 25, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@flash1293 flash1293 changed the title Observability Kubernetes Onboarding doesn't work Observability Kubernetes Onboarding doesn't ship data Sep 25, 2024
@MichaelKatsoulis
Copy link
Contributor

MichaelKatsoulis commented Sep 25, 2024

This log error does not say anything about the reason it crashed. We would need to reproduce the environment and check the diagnostics and the agent pod consumption

@flash1293
Copy link
Author

@MichaelKatsoulis

I started a local minikube cluster, then followed the onboarding flow from a fresh Observability serverless project on prod.

@MichaelKatsoulis
Copy link
Contributor

I replicated the scenario:

  1. Kind cluster with 38 pods and 3 nodes
  2. Fresh serverless project

I followed the instruction of monitoring Kubernetes as if I was a first time user.

I noticed the following:

  1. The kustomize command attempts to override the Elasticsearch host by setting
-e "s/%ES_HOST%/https:\/\/katsoulis-serverless-f68892.es.us-east-1.aws.elastic.cloud/g"

Elastic-Agent in the absence of a port, appends the port in the end which by default is 9200. So the ES_HOST ends up https://katsoulis-serverless-f68892.es.us-east-1.aws.elastic.cloud:9200
This leads to connection refused. In order to overcome this, we need to modify the ES_HOST to

-e "s/%ES_HOST%/https:\/\/katsoulis-serverless-f68892.es.us-east-1.aws.elastic.cloud:443/g"
  1. Elastic-Agent starts successfully and data are flowing
    Image

  2. The first thing a user sees is a link to a dashboard which does not exist!
    Image

  3. In discovery we can see metrics and logs
    Image

  4. After some minutes we see the first restart of one of the agent's pods.
    Image

  5. Reason is OOM killed
    Image

Conclusion:

Restarts:
As per my analysis and tests in #4729 (comment)
in version 8.15.1 elastic-agent with Kubernetes and system integration needs more than 700Mb of memory.
So the limit is set low causing restarts.

Dashboard
Should also Kubernetes Integration be installed under the hood which contains the assets?

ES_HOST
We should always set the port of Elasticsearch because if not set, agent appends 9200.

@flash1293
Copy link
Author

Thanks for the investigation @MichaelKatsoulis !

Restarts:
As per my analysis and tests in #4729 (comment)
in version 8.15.1 elastic-agent with Kubernetes and system integration needs more than 700Mb of memory.
So the limit is set low causing restarts.

I guess this is something that needs to be changed on the elastic-agent side, right?

Should also Kubernetes Integration be installed under the hood which contains the assets?

Good catch, seems like the id of the dashboard changed in this PR: elastic/integrations#10593 We should fix it short-term, but we need to think how we can make this whole process more stable.

We should always set the port of Elasticsearch because if not set, agent appends 9200.

I see, I think in a previous version it would append it, but the config value we pull this from changed. We can fix this on the Kibana side as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

No branches or pull requests

3 participants