Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make metric cardinality limit configurable (OpenTelemetry Protocol (OTLP) exporter) #6445

Open
frittentheke opened this issue Dec 12, 2024 · 0 comments

Comments

@frittentheke
Copy link

frittentheke commented Dec 12, 2024

Is your feature request related to a problem? Please describe.

The router allows to configure value extraction to add labels and cardinality to the exported metrics (https://www.apollographql.com/docs/graphos/reference/router/telemetry/metrics-exporters/otlp).

The used opentelemetry-sdk received a hard limit of 2000 for the label cardinality with [release 0.20.0](https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-sdk/CHANGELOG.md#v0200) and with PR open-telemetry/opentelemetry-rust#1066

Matrics / Streams with a cardinality exceeding 2000 will only be emitted via the overflow tagged metrics:
apollo_router_http_requests_total{job="router", otel_metric_overflow="true", otel_scope_name="apollo/router"} 1
which causes all of the cardinality to be lost / dropped. Also a warning is logged (#5287, OpenTelemetry metric error occurred: Metrics error: Warning: Maximum data points for metric stream exceeded/ Entry added to overflow

While this protection feature in very appreciated and according to [OTel spec](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.39.0/specification/metrics/sdk.md#cardinality-limits) it's not configurable yet, so 2000 is a hard limit. But configurability of this value is planned:

Describe the solution you'd like

Please track the upstream feature (open-telemetry/opentelemetry-rust#1951) and expose a configuration variable to allow increasing / adjusting the cardinality limit. Maybe this could be part of the umbrella issue #3226 ?

Describe alternatives you've considered

Moving away from adding an ever higher cardinality to the metrics and switching to using access logs with all the fields certainly makes sense a some point - provided one has capable log shipping and aggregation in place.

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant