Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEBUG-2334 Probe Notifier Worker component #4028

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

p-datadog
Copy link
Contributor

What does this PR do?
This is a background thread that notification payloads (probe
status and probe snapshots) can be submitted to.
The payloads will be batched into groups if possible, and
sent to the local agent asynchronously.

Motivation:

Initial DI implementation.

Change log entry
None

Additional Notes:

How to test the change?
Unit tests in this PR

p added 3 commits October 24, 2024 11:16
This is a background thread that notification payloads (probe
status and probe snapshots) can be submitted to.
The payloads will be batched into groups if possible, and
sent to the local agent asynchronously.
@p-datadog p-datadog requested a review from a team as a code owner October 24, 2024 21:43
@pr-commenter
Copy link

pr-commenter bot commented Oct 24, 2024

Benchmarks

Benchmark execution time: 2024-10-25 18:08:27

Comparing candidate commit b8ae0cd in PR branch di-probe-notifier with baseline commit 91d883f in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

scenario:profiler - sample timeline=false

  • 🟩 throughput [+0.589op/s; +0.598op/s] or [+9.358%; +9.513%]

Copy link
Contributor

@Strech Strech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add some small code adjustments to reduce the volume of the methods. Great job 👏🏼

Comment on lines +27 to +29
# Minimum interval between submissions.
# TODO make this into an internal setting and increase default to 2 or 3.
MIN_SEND_INTERVAL = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of adding scale here? Is it seconds, milliseconds or ..? Maybe we can name it MIN_SEND_INTERVAL_SEC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that #4012 has not yet been looked at, and I have another 2000+ lines of code pending locally, I would like to only make changes in this and other open PRs that address clear problems. I am happy to discuss adding units to times and if there is team consensus on how the units should be indicated, add them in a subsequent PR.

Comment on lines 31 to 41
def initialize(settings, agent_settings, transport)
@settings = settings
@status_queue = []
@snapshot_queue = []
@transport = transport
@lock = Mutex.new
@wake = Core::Semaphore.new
@io_in_progress = false
@sleep_remaining = nil
@wake_scheduled = false
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see agent_settings to be used, is it correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are now consumed by the transport, I removed agent settings from probe notifier worker.

lib/datadog/di/probe_notifier_worker.rb Outdated Show resolved Hide resolved
Comment on lines +55 to +60
if sleep_remaining && sleep_remaining > 0
# Recalculate how much sleep time is remaining, then sleep that long.
set_sleep_remaining
else
0
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if sleep_remaining && sleep_remaining > 0
# Recalculate how much sleep time is remaining, then sleep that long.
set_sleep_remaining
else
0
end
# `set_sleep_remaining` will recalculate how much sleep time is remaining, then sleep that long.
sleep_remaining && sleep_remaining > 0 ? set_sleep_remaining : 0

Comment on lines +63 to +68
if sleep_remaining > 0
# Do not need to update @wake_scheduled here because
# wake-up is already scheduled for the earliest possible time.
wake.wait(sleep_remaining)
next
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if sleep_remaining > 0
# Do not need to update @wake_scheduled here because
# wake-up is already scheduled for the earliest possible time.
wake.wait(sleep_remaining)
next
end
# Do not need to update @wake_scheduled here because
# wake-up is already scheduled for the earliest possible time.
next wake.wait(sleep_remaining) if sleep_remaining > 0

Comment on lines 70 to 81
begin
more = maybe_send
rescue => exc
raise if settings.dynamic_instrumentation.propagate_all_exceptions

warn "Error in probe notifier worker: #{exc.class}: #{exc} (at #{exc.backtrace.first})"
end
@lock.synchronize do
@wake_scheduled = more
end
wake.wait(more ? MIN_SEND_INTERVAL : nil)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
begin
more = maybe_send
rescue => exc
raise if settings.dynamic_instrumentation.propagate_all_exceptions
warn "Error in probe notifier worker: #{exc.class}: #{exc} (at #{exc.backtrace.first})"
end
@lock.synchronize do
@wake_scheduled = more
end
wake.wait(more ? MIN_SEND_INTERVAL : nil)
end
begin
more = maybe_send
rescue => exc
raise if settings.dynamic_instrumentation.propagate_all_exceptions
warn "Error in probe notifier worker: #{exc.class}: #{exc} (at #{exc.backtrace.first})"
end
@lock.synchronize { @wake_scheduled = more }
wake.wait(more ? MIN_SEND_INTERVAL : nil)
end

Comment on lines +92 to +94
unless thread&.join(timeout)
thread.kill
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unless thread&.join(timeout)
thread.kill
end
thread.kill unless thread&.join(timeout)

Comment on lines +107 to +109
if @thread.nil? || [email protected]?
return
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if @thread.nil? || !@thread.alive?
return
end
return if @thread.nil? || !@thread.alive?

Comment on lines +123 to +124
sleep 0.25
next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sleep 0.25
next
next sleep(0.25)

Comment on lines +142 to +145
[
[:status, 'probe status'],
[:snapshot, 'snapshot'],
].each do |(event_type, event_name)|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[
[:status, 'probe status'],
[:snapshot, 'snapshot'],
].each do |(event_type, event_name)|
{status: 'probe status', snapshot: 'snapshot'}.each do |event_type, event_name|

Comment on lines +92 to +94
unless thread&.join(timeout)
thread.kill
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀 This will fail if thread is nil:

[3] pry(main)> thread = nil
=> nil
[4] pry(main)> unless thread&.join(123)
[4] pry(main)*   thread.kill
[4] pry(main)* end  
NoMethodError: undefined method `kill' for nil:NilClass
from (pry):7:in `__pry__

Comment on lines +115 to +124
if io_in_progress
# If we just call Thread.pass we could be in a busy loop -
# add a sleep.
sleep 0.25
next
elsif queues_empty
break
else
sleep 0.25
next
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to avoid the sleeping by using a condition variable to flag when the queue is empty

Comment on lines +84 to +112
context 'when three snapshots are added in quick succession' do
it 'sends two batches' do
expect(worker.send(:snapshot_queue)).to be_empty

expect(transport).to receive(:send_snapshot).once.with([snapshot])

worker.add_snapshot(snapshot)
sleep 0.1
worker.add_snapshot(snapshot)
sleep 0.1
worker.add_snapshot(snapshot)

# Since sending is asynchronous, we need to relinquish execution
# for the sending thread to run.
sleep(0.1)

# At this point the first snapshot should have been sent,
# with the remaining two in the queue
expect(worker.send(:snapshot_queue)).to eq([snapshot, snapshot])

sleep 0.4
# Still within the cooldown period
expect(worker.send(:snapshot_queue)).to eq([snapshot, snapshot])

expect(transport).to receive(:send_snapshot).once.with([snapshot, snapshot])

sleep 0.5
expect(worker.send(:snapshot_queue)).to eq([])
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, avoid using sleeps in tests -- they make the test suite both slower and flakier >_>

@p-datadog p-datadog added the dev/internal Other internal work that does not need to be included in the changelog label Oct 25, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 86.93182% with 23 lines in your changes missing coverage. Please review.

Project coverage is 97.82%. Comparing base (91d883f) to head (b8ae0cd).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
lib/datadog/di/probe_notifier_worker.rb 77.66% 23 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4028      +/-   ##
==========================================
- Coverage   97.86%   97.82%   -0.04%     
==========================================
  Files        1321     1324       +3     
  Lines       79326    79501     +175     
  Branches     3934     3958      +24     
==========================================
+ Hits        77631    77775     +144     
- Misses       1695     1726      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev/internal Other internal work that does not need to be included in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants