Does usage of target-allocator reduce load on api-server #3529

sfc-gh-akrishnan · 2024-12-10T02:12:36Z

Component(s)

target allocator

Describe the issue you're reporting

I am using a daemon-set of open-telemetry collector in tandem with 1 instance of target-allocator with per-node allocation strategy. All scraping targets are node local, and I use kubernetes_sd_config for service discovery.

I compared the above setup against a daemonset of otel-collector each using relabel_config and kubernetes_sd_config to filter the node local pods to scrape from.

Since, Target Allocator (TA) doc reads:

The TA is a mechanism for decoupling the service discovery and metric collection functions of Prometheus such that they can be scaled independently

I expected the load on api-server to go down with Otel + TA against only Otel. But my observation is contrary where the load on api-server with and without TA is similar.

Can I get some clarity if there is a gap in my understanding, or if there is a tunable that I can configure?

Sample TA config:

    # Used by TargetAllocator watcher to discover Otel-Collector pods using labels
    collector_selector:
      matchlabels:
        cluster-addon-name: otel-collector

    # Algorithm to use to allocate endpoints amongst Otel-Collector pods
    allocation_strategy: per-node

    # Since we are using `per-node` allocation strategy, this would not take effect
    # for endpoints which are not associated with any node (e.g. apiserver)
    # For those cases we use the fallback strategy
    allocation_fallback_strategy: least-weighted

    # Should relabel-config be respected? (Yes)
    filter_strategy: relabel-config

    # Actual receiver config
    config:
      scrape_configs:
        ...

Sample Otel-config:

    receivers:
      prometheus:
        target_allocator:
          endpoint: http://target-allocator-service.system-metrics.svc.internal
          interval: 60s
          collector_id: "${POD_NAME}"

    processors:
      batch:
        send_batch_size: 1000
        timeout: 5s
      memory_limiter:
        limit_mib: 2500
        spike_limit_mib: 150
        check_interval: 5s
....

The text was updated successfully, but these errors were encountered:

swiatekm · 2024-12-11T12:26:09Z

Generally speaking, target allocator does the service discovery, so collector Pods shouldn't need to talk to the API Server for that purpose, at least. Could you post your full Collector manifests for both cases?

sfc-gh-akrishnan added the needs triage label Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does usage of target-allocator reduce load on api-server #3529

Does usage of target-allocator reduce load on api-server #3529

sfc-gh-akrishnan commented Dec 10, 2024 •

edited

Loading

swiatekm commented Dec 11, 2024

Does usage of target-allocator reduce load on api-server #3529

Does usage of target-allocator reduce load on api-server #3529

Comments

sfc-gh-akrishnan commented Dec 10, 2024 • edited Loading

Component(s)

Describe the issue you're reporting

swiatekm commented Dec 11, 2024

sfc-gh-akrishnan commented Dec 10, 2024 •

edited

Loading