HTTP service discovery #16

errm · 2023-11-23T10:34:21Z

In our environment we make use of prometheus operator to configure prometheus for us.

Whilst it is possible to use file_sd in our environment it's a bit more of a pain to configure, as would need to add a sidecar to our prometheus, and add a shared volume so the file can be read.

With http_sd we can just run a Deployment of prometheus-msk-discovery and drop a ScrapeConfig resource into our manifests.

The shape of the API is exactly the same, only difference is it needs to be json, rather than yaml.

I added a flag -http-sd to enable this second mode - without it everything should behave exactly the same as it did before, so as to not cause any issues for existing users.

joshm91 · 2023-11-24T22:55:29Z

Hey, thanks for this!

The only thing I'm slightly unsure about is that the MSK clusters get scraped for every request to the HTTP endpoint you've added, especially with the current interval flag we have which is meant to control that scraping frequency.

What do you think about changing it so that the current infinite for loop runs in both modes to scrape for clusters at the set interval and maintains an internal state for the latest fetch. When you have file_sd mode enabled then this gets written to a file at the same frequency and when you have http_sd mode enabled then the handler just reads and returns this internal state rather than initiating its own scrape?

errm · 2023-11-27T13:55:51Z

Hi @joshm91 with http_sd you can set the interval that prometheus scrapes the endpoint with the refresh_interval configured in the prometheus config.

I am not 100% about service discovery endpoints, but for exporters, it is usually best practice to not do any calculations / calls until the endpoint is actually scraped.

If the endpoints are refreshed at a different rate from that at which prometheus is calling the endpoint then one of two things could happen:

Prometheus is calling the endpoint, but the data is stale by some unknown amount of time.
or more likely, prometheus is configured to scrape the endpoint less often than the refresh interval, so we end up calling the aws api more often than required, (but the data at the point when we scrape the endpoint may well still be stale)

With file_sd because prometheus can monitor the file we (prometheus-msk-discovery) are essentially pushing any updates to prometheus, with http_sd prometheus itself is in control of how much poling to do, so I think it makes more sense to have prometheus be in control of the interval via the refresh_interval attribute in it's config.

joshm91 · 2023-11-27T21:30:26Z

Thanks for the clarification - that seems totally reasonable.

I'll merge and release shortly.

errm and others added 4 commits November 23, 2023 09:56

Extract fileSD function

a0b2984

Add option for http service discovery

ac3fad0

Update docs

de928db

Fix code block in readme

c38f254

Clarify scrape interval parameter

c905815

joshm91 merged commit c686e6d into statsbomb:main Nov 27, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP service discovery #16

HTTP service discovery #16

errm commented Nov 23, 2023

joshm91 commented Nov 24, 2023

errm commented Nov 27, 2023

joshm91 commented Nov 27, 2023

HTTP service discovery #16

HTTP service discovery #16

Conversation

errm commented Nov 23, 2023

joshm91 commented Nov 24, 2023

errm commented Nov 27, 2023

joshm91 commented Nov 27, 2023