feat(localenv): span metrics generation #2849

BlairCurrey · 2024-08-07T18:27:04Z

Changes proposed in this pull request

Configures tempo to generate metrics based off spans
Adds visualization for 95th percentile graphql resolver durations to localenv dashboard (live dashboard not applicable because tracing is local only atm)

Visualization preview:

Context

Checklist

- adds configuration that generates span metrics from tempo traces - can see new `traces_spanmetrics_bucket` etc. in local grafana dashboard

netlify · 2024-08-07T18:27:23Z

✅ Deploy Preview for brilliant-pasca-3e80ec canceled.

Name	Link
🔨 Latest commit	`95d2eef`
🔍 Latest deploy log	https://app.netlify.com/sites/brilliant-pasca-3e80ec/deploys/66b514d4ad177d000843eb89

BlairCurrey · 2024-08-07T18:33:17Z

I considered if we wanted to add visualizations for each resolver. Like stat metrics for 25th, 50th, 95th percentile etc or the heatmap/histogram like we have for the pay times but opted not to.

First, I feel like we will better understand what details we need as we actually consume these (as part of performance testing analysis?). Second, I think we probably mostly care about the extreme high end (ie 95th, 99th percentile etc). In which case maybe we just add another bar gauge like the included one but for 99th percentile.

Open to other ideas for what visualizations we need for this but I think this one gives us the gist of what we're looking for.

JoblersTune · 2024-08-08T06:42:06Z

live dashboard not applicable because tracing is local only atm

I'm curious exactly what our plan is with the local dashboard? Are we using it for dev? Are we using it to measure only certain local metrics. It's not exactly clear to me.

mkurapov · 2024-08-08T17:38:52Z

localenv/telemetry/grafana/provisioning/dashboards/example.json

+          "refId": "A"
+        }
+      ],
+      "title": "Panel Title",


lets update this title

mkurapov · 2024-08-08T17:40:33Z

localenv/telemetry/grafana/provisioning/dashboards/example.json

+            "uid": "PBFA97CFB590B2093"
+          },
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.95, sum(rate(traces_spanmetrics_latency_bucket{span_name=~\"^(mutation|query).*\"}[$__rate_interval])) by (le, span_name))",


should this be something other than$__rate_interval, but instead the selected interval of the dashboard? That way you can see the timings per last x minutes/seconds etc

should this be something other than$__rate_interval, but instead the selected interval of the dashboard?

From what I can tell it does factor in the current time range. I spun up the localenv, ran some queries and saw the data in this visualization with 5m time range. I waited 5m+ and saw no data until I bumped to 15m time range.

Im also seeing it generally recommended as starting point for the rate arg :

https://grafana.com/docs/grafana/latest/datasources/prometheus/template-variables/#use-__rate_interval

https://grafana.com/blog/2020/09/28/new-in-grafana-7.2-__rate_interval-for-prometheus-rate-queries-that-just-work/

mkurapov · 2024-08-08T17:41:05Z

I'm curious exactly what our plan is with the local dashboard? Are we using it for dev? Are we using it to measure only certain local metrics. It's not exactly clear to me.

Mainly for performance testing and debugging

BlairCurrey · 2024-08-08T18:14:00Z

live dashboard not applicable because tracing is local only atm

I'm curious exactly what our plan is with the local dashboard? Are we using it for dev? Are we using it to measure only certain local metrics. It's not exactly clear to me.

I'm mostly using it to validate the metric collection and develop visualizations. If it were applicable to the live version I would add them there after merging this (although I guess technically it wouldn't have any data until the next release). I dont think we need to maintain parity with the live version or have examples for every single metric, but its nice to have some basic proof-of-concept visualizations for the different types of metrics (traces, histograms, counts, etc.) IMO.

Thinking back to our conversation about development workflow I think in theory it would be nice to develop locally, commit, then publishing to grafana from ci. This would unify it with our general change workflow and it would be version controlled. But not sure its worth the setup tbh.

* feat(localenv): add span metric generation - adds configuration that generates span metrics from tempo traces - can see new `traces_spanmetrics_bucket` etc. in local grafana dashboard * feat(localenv): add gql resolver metric * chore(localenv): give panel title

BlairCurrey added 2 commits August 7, 2024 13:28

feat(localenv): add span metric generation

6124ebf

- adds configuration that generates span metrics from tempo traces - can see new `traces_spanmetrics_bucket` etc. in local grafana dashboard

feat(localenv): add gql resolver metric

b286d08

BlairCurrey requested a review from mkurapov August 7, 2024 18:40

BlairCurrey marked this pull request as ready for review August 7, 2024 18:40

BlairCurrey requested a review from JoblersTune August 7, 2024 18:41

mkurapov reviewed Aug 8, 2024

View reviewed changes

chore(localenv): give panel title

95d2eef

mkurapov approved these changes Aug 9, 2024

View reviewed changes

BlairCurrey merged commit 53846d6 into main Aug 9, 2024
42 checks passed

BlairCurrey deleted the bc/2802/investigate-span-metrics-generator branch August 9, 2024 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(localenv): span metrics generation #2849

feat(localenv): span metrics generation #2849

BlairCurrey commented Aug 7, 2024 •

edited

Loading

netlify bot commented Aug 7, 2024 •

edited

Loading

BlairCurrey commented Aug 7, 2024 •

edited

Loading

JoblersTune commented Aug 8, 2024

mkurapov Aug 8, 2024

mkurapov Aug 8, 2024 •

edited

Loading

BlairCurrey Aug 8, 2024 •

edited

Loading

mkurapov commented Aug 8, 2024

BlairCurrey commented Aug 8, 2024 •

edited

Loading

feat(localenv): span metrics generation #2849

feat(localenv): span metrics generation #2849

Conversation

BlairCurrey commented Aug 7, 2024 • edited Loading

Changes proposed in this pull request

Context

Checklist

netlify bot commented Aug 7, 2024 • edited Loading

✅ Deploy Preview for brilliant-pasca-3e80ec canceled.

BlairCurrey commented Aug 7, 2024 • edited Loading

JoblersTune commented Aug 8, 2024

mkurapov Aug 8, 2024

Choose a reason for hiding this comment

mkurapov Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

BlairCurrey Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

mkurapov commented Aug 8, 2024

BlairCurrey commented Aug 8, 2024 • edited Loading

BlairCurrey commented Aug 7, 2024 •

edited

Loading

netlify bot commented Aug 7, 2024 •

edited

Loading

BlairCurrey commented Aug 7, 2024 •

edited

Loading

mkurapov Aug 8, 2024 •

edited

Loading

BlairCurrey Aug 8, 2024 •

edited

Loading

BlairCurrey commented Aug 8, 2024 •

edited

Loading