-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: publish trace decisions through redis pubsub #1362
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some feedback, looks like it's going in the right direction.
collect/collect.go
Outdated
err error | ||
) | ||
if tr.Kept() { | ||
// TODO: late span in this case won't get HasRoot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this okay? If a trace times out without receiving the root span, a decision is made without the root and the decision is cached. Root and non-root spans should then still honour the decision, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decision is still honored. The late span just won't be decorated with metadata like span count and the it won't increase trace_send_got_root
metric
KeptReason: keptReason, | ||
SampleRate: tr.Rate(), | ||
Count: uint32(tr.SpanCount()), | ||
EventCount: uint32(tr.SpanEventCount()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's annoying we have to cast this type, would be nicer if they both uint32's
} | ||
toDelete := generics.NewSet[string]() | ||
for _, td := range tds { | ||
trace := i.cache.Get(td.TraceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should pull the trace just to check if it exists. Trying to remove items from the cache that don't exist should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. My intention is to retrieve the trace object from the cache before processing it in the send()
function. If it doesn't exist in the cache, we don't need to make sampling decision for it
triggered chan struct{} | ||
once sync.Once | ||
} | ||
i.sampleTraceCache.Record(trace, td.Kept, td.KeptReason) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this cache both for drop and kept traces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this cache can be renamed to something like TraceDecisionCache
to make it more clear
// Calculate the backoff interval using exponential backoff with a base time. | ||
backoff := time.Duration(math.Min(float64(lastBackoff)*2, float64(r.maxDelay))) | ||
// Add jitter to the backoff to avoid retry collisions. | ||
jitter := time.Duration(rand.Float64() * float64(backoff) * 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if all nodes coming online may end up with the same value? I've send backoff and jitter values use a randomness components to try to prevent two processes polling at the same time.
switch data[0] { | ||
case droppedPrefix: | ||
for _, traceID := range strings.Split(data[1], ",") { | ||
td = append(td, TraceDecision{TraceID: traceID}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the trace decision indicate kept = false here?
d670c84
to
982770c
Compare
Which problem is this PR solving?
When a sampling decision is made, the decider node needs to publish the decision alongside all metadata related to that decision to all peers.
Short description of the changes
TraceDecision
type to capture all information neededredistributeNotifier
into its own file to reduce the length of thecollect.go
filesend()
into two functionsmakeDecision
andsend
trace_decision
pubsub topic for transferring trace decisionsdrop
andkept
MockSharder
so that we can test trace ownership easier