Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix span leak for inbound & outbound #922

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

ChenX1993
Copy link
Contributor

@ChenX1993 ChenX1993 commented Aug 26, 2024

Problem

Span leak is a issue in observability tracing where a span is started, but is not finished. The span will not be sent to tracing backend if it is not finished, hence causing incomplete traces.

span := tracer.StartSpan(...)
...
span.Finish()   // span leak happens when this line is not executed.

Internally at Uber, we fond incomplete traces from time to time due to span leak caused in tchannel-go library. In the happy path, the span is properly finished, but the span leak issue in tchannel-go happens in error path when a function early returns with error.
E.g.

call, err := c.startCall(ctx, thriftService+"::"+methodName, &tchannel.CallOptions{
Format: tchannel.Thrift,
RequestState: rs,
})
if err != nil {
return err
}
if err := writeArgs(call, headers, req); err != nil {
return err
}
respHeaders, isOK, err = readResponse(call.Response(), resp)
return err

In above code snippet, A span is created within c.startCall, and is finished within readResponse when reader.Close() is executed. If there is an error occurred in writeArgs, then the span leak happens.

This PR is to fix the span leak issue.

Change Summary

  1. Add span component tag following the OpenTracing semantic conventions to indicate the software package that created spans.
  2. For inbound:
    2.1. Add finishSpanWithOptions method that allows span.Finish() to be invoked many times in InboundCallResponse.
    2.2 . Use finishSpanWithOptions in dispatchInbound to always try to finish a span after the handler is called.
  3. For outbound:
    3.1. Add finishSpanWithOptions in OutboundCallResponse similar to inbound case
    3.2. Use finishSpanWithOptions json, thrift and raw clients.

Next

At Uber, YARPC (yarpc-go) is often used as a umbrella RPC framework with gRPC, TChannel and HTTP as transports. When YARPC is used with TChannel, the span leak issue does also exist.

With this PR, the span leak in inbound is expected to be fixed for YARPC as well, but for outbound, we need to add defer call.Response().Done here in yarpc-go after this PR is merged.

Copy link

codecov bot commented Aug 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.83%. Comparing base (49a0061) to head (6e0853b).
Report is 3 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #922      +/-   ##
==========================================
- Coverage   89.15%   88.83%   -0.33%     
==========================================
  Files          43       43              
  Lines        4517     4548      +31     
==========================================
+ Hits         4027     4040      +13     
- Misses        373      389      +16     
- Partials      117      119       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant