Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(localenv): span metrics generation #2849

Merged
merged 3 commits into from
Aug 9, 2024

Conversation

BlairCurrey
Copy link
Contributor

@BlairCurrey BlairCurrey commented Aug 7, 2024

Changes proposed in this pull request

  • Configures tempo to generate metrics based off spans
  • Adds visualization for 95th percentile graphql resolver durations to localenv dashboard (live dashboard not applicable because tracing is local only atm)

Visualization preview:

image

Context

fixes: #2848

Checklist

  • Related issues linked using fixes #number
  • Tests added/updated
  • Documentation added
  • Make sure that all checks pass
  • Bruno collection updated

- adds configuration that generates span metrics from tempo traces
- can see new `traces_spanmetrics_bucket` etc. in local grafana dashboard
Copy link

netlify bot commented Aug 7, 2024

Deploy Preview for brilliant-pasca-3e80ec canceled.

Name Link
🔨 Latest commit 95d2eef
🔍 Latest deploy log https://app.netlify.com/sites/brilliant-pasca-3e80ec/deploys/66b514d4ad177d000843eb89

@BlairCurrey
Copy link
Contributor Author

BlairCurrey commented Aug 7, 2024

I considered if we wanted to add visualizations for each resolver. Like stat metrics for 25th, 50th, 95th percentile etc or the heatmap/histogram like we have for the pay times but opted not to.

First, I feel like we will better understand what details we need as we actually consume these (as part of performance testing analysis?). Second, I think we probably mostly care about the extreme high end (ie 95th, 99th percentile etc). In which case maybe we just add another bar gauge like the included one but for 99th percentile.

Open to other ideas for what visualizations we need for this but I think this one gives us the gist of what we're looking for.

@BlairCurrey BlairCurrey marked this pull request as ready for review August 7, 2024 18:40
@JoblersTune
Copy link
Collaborator

live dashboard not applicable because tracing is local only atm

I'm curious exactly what our plan is with the local dashboard? Are we using it for dev? Are we using it to measure only certain local metrics. It's not exactly clear to me.

"refId": "A"
}
],
"title": "Panel Title",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets update this title

"uid": "PBFA97CFB590B2093"
},
"editorMode": "code",
"expr": "histogram_quantile(0.95, sum(rate(traces_spanmetrics_latency_bucket{span_name=~\"^(mutation|query).*\"}[$__rate_interval])) by (le, span_name))",
Copy link
Contributor

@mkurapov mkurapov Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be something other than$__rate_interval, but instead the selected interval of the dashboard? That way you can see the timings per last x minutes/seconds etc

Copy link
Contributor Author

@BlairCurrey BlairCurrey Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be something other than$__rate_interval, but instead the selected interval of the dashboard?

From what I can tell it does factor in the current time range. I spun up the localenv, ran some queries and saw the data in this visualization with 5m time range. I waited 5m+ and saw no data until I bumped to 15m time range.

Im also seeing it generally recommended as starting point for the rate arg :

@mkurapov
Copy link
Contributor

mkurapov commented Aug 8, 2024

I'm curious exactly what our plan is with the local dashboard? Are we using it for dev? Are we using it to measure only certain local metrics. It's not exactly clear to me.

Mainly for performance testing and debugging

@BlairCurrey
Copy link
Contributor Author

BlairCurrey commented Aug 8, 2024

live dashboard not applicable because tracing is local only atm

I'm curious exactly what our plan is with the local dashboard? Are we using it for dev? Are we using it to measure only certain local metrics. It's not exactly clear to me.

I'm mostly using it to validate the metric collection and develop visualizations. If it were applicable to the live version I would add them there after merging this (although I guess technically it wouldn't have any data until the next release). I dont think we need to maintain parity with the live version or have examples for every single metric, but its nice to have some basic proof-of-concept visualizations for the different types of metrics (traces, histograms, counts, etc.) IMO.

Thinking back to our conversation about development workflow I think in theory it would be nice to develop locally, commit, then publishing to grafana from ci. This would unify it with our general change workflow and it would be version controlled. But not sure its worth the setup tbh.

@BlairCurrey BlairCurrey merged commit 53846d6 into main Aug 9, 2024
42 checks passed
@BlairCurrey BlairCurrey deleted the bc/2802/investigate-span-metrics-generator branch August 9, 2024 16:58
sabineschaller pushed a commit that referenced this pull request Aug 15, 2024
* feat(localenv): add span metric generation

- adds configuration that generates span metrics from tempo traces
- can see new `traces_spanmetrics_bucket` etc. in local grafana dashboard

* feat(localenv): add gql resolver metric

* chore(localenv): give panel title
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configure Telemetry for Span Metrics Generation and add Visualizations
3 participants