New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

DEBUG-2334 Probe Notifier Worker component #4028

Open

p-datadog wants to merge 12 commits into master from di-probe-notifier

+475 −0

Contributor

p-datadog commented Oct 24, 2024

What does this PR do?
This is a background thread that notification payloads (probe
status and probe snapshots) can be submitted to.
The payloads will be batched into groups if possible, and
sent to the local agent asynchronously.

Motivation:

Initial DI implementation.

Change log entry
None

Additional Notes:

How to test the change?
Unit tests in this PR

p added 3 commits

October 24, 2024 11:16


          DEBUG-2334 Probe Notifier Worker component

c901b11

This is a background thread that notification payloads (probe
status and probe snapshots) can be submitted to.
The payloads will be batched into groups if possible, and
sent to the local agent asynchronously.


          extract semaphore implementation, reimplement waiting


          types

82de6e5

p-datadog requested a review from a team as a code owner

October 24, 2024 21:43

p-datadog and others added 3 commits

October 24, 2024 17:43


          Merge branch 'master' into di-probe-notifier

f0ead5f


          set wake_scheduled

f5f2442


          rubocop

be1eae5

pr-commenter bot commented Oct 24, 2024 •

edited

Loading

Benchmarks

Benchmark execution time: 2024-10-25 18:08:27

Comparing candidate commit b8ae0cd in PR branch di-probe-notifier with baseline commit 91d883f in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 2 unstable metrics.

scenario:profiler - sample timeline=false

🟩 throughput [+0.589op/s; +0.598op/s] or [+9.358%; +9.513%]

Strech reviewed

View reviewed changes

Contributor

Strech left a comment

I would like to add some small code adjustments to reduce the volume of the methods. Great job 👏🏼

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +27 to +29

+                    # Minimum interval between submissions.
+                    # TODO make this into an internal setting and increase default to 2 or 3.
+                    MIN_SEND_INTERVAL = 1

Contributor

Strech Oct 25, 2024

WDYT of adding scale here? Is it seconds, milliseconds or ..? Maybe we can name it MIN_SEND_INTERVAL_SEC?

Contributor Author

p-datadog Oct 25, 2024

Given that #4012 has not yet been looked at, and I have another 2000+ lines of code pending locally, I would like to only make changes in this and other open PRs that address clear problems. I am happy to discuss adding units to times and if there is team consensus on how the units should be indicated, add them in a subsequent PR.

lib/datadog/di/probe_notifier_worker.rb Outdated

Comment on lines 31 to 41

+                    def initialize(settings, agent_settings, transport)
+                      @settings = settings
+                      @status_queue = []
+                      @snapshot_queue = []
+                      @transport = transport
+                      @lock = Mutex.new
+                      @wake = Core::Semaphore.new
+                      @io_in_progress = false
+                      @sleep_remaining = nil
+                      @wake_scheduled = false
+                    end

Contributor

Strech Oct 25, 2024

I do not see agent_settings to be used, is it correct?

Contributor Author

p-datadog Oct 25, 2024

They are now consumed by the transport, I removed agent settings from probe notifier worker.

lib/datadog/di/probe_notifier_worker.rb Outdated Show resolved Hide resolved

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +55 to +60

+                            if sleep_remaining && sleep_remaining > 0
+                              # Recalculate how much sleep time is remaining, then sleep that long.
+                              set_sleep_remaining
+                            else
+
+                            end

Contributor

Strech Oct 25, 2024

Suggested change

      
                          if sleep_remaining && sleep_remaining > 0
          
                            # Recalculate how much sleep time is remaining, then sleep that long.
          
                            set_sleep_remaining
          
                          else
          
                            0
          
                          end
          
                          # `set_sleep_remaining` will recalculate how much sleep time is remaining, then sleep that long.
          
                          sleep_remaining && sleep_remaining > 0 ? set_sleep_remaining : 0

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +63 to +68

+                          if sleep_remaining > 0
+                            # Do not need to update @wake_scheduled here because
+                            # wake-up is already scheduled for the earliest possible time.
+                            wake.wait(sleep_remaining)
+                            next
+                          end

Contributor

Strech Oct 25, 2024

Suggested change

      
                        if sleep_remaining > 0
          
                          # Do not need to update @wake_scheduled here because
          
                          # wake-up is already scheduled for the earliest possible time.
          
                          wake.wait(sleep_remaining)
          
                          next
          
                        end
          
                        # Do not need to update @wake_scheduled here because
          
                        # wake-up is already scheduled for the earliest possible time.
          
                        next wake.wait(sleep_remaining) if sleep_remaining > 0

lib/datadog/di/probe_notifier_worker.rb Outdated

Comment on lines 70 to 81

+                          begin
+                            more = maybe_send
+                          rescue => exc
+                            raise if settings.dynamic_instrumentation.propagate_all_exceptions
+                            warn "Error in probe notifier worker: #{exc.class}: #{exc} (at #{exc.backtrace.first})"
+                          end
+                          @lock.synchronize do
+                            @wake_scheduled = more
+                          end
+                          wake.wait(more ? MIN_SEND_INTERVAL : nil)
+                        end

Contributor

Strech Oct 25, 2024

Suggested change

      
                        begin
          
                          more = maybe_send
          
                        rescue => exc
          
                          raise if settings.dynamic_instrumentation.propagate_all_exceptions
          
                          warn "Error in probe notifier worker: #{exc.class}: #{exc} (at #{exc.backtrace.first})"
          
                        end
          
                        @lock.synchronize do
          
                          @wake_scheduled = more
          
                        end
          
                        wake.wait(more ? MIN_SEND_INTERVAL : nil)
          
                      end
          
                        begin
          
                          more = maybe_send
          
                        rescue => exc
          
                          raise if settings.dynamic_instrumentation.propagate_all_exceptions
          
                          warn "Error in probe notifier worker: #{exc.class}: #{exc} (at #{exc.backtrace.first})"
          
                        end
          
                        @lock.synchronize { @wake_scheduled = more }
          
                        wake.wait(more ? MIN_SEND_INTERVAL : nil)
          
                      end

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +92 to +94

+                      unless thread&.join(timeout)
+                        thread.kill
+                      end

Contributor

Strech Oct 25, 2024

Suggested change

      
                    unless thread&.join(timeout)
          
                      thread.kill
          
                    end
          
                    thread.kill unless thread&.join(timeout)

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +107 to +109

+                        if @thread.nil? || [email protected]?
+                          return
+                        end

Contributor

Strech Oct 25, 2024

Suggested change

      
                      if @thread.nil? || !@thread.alive?
          
                        return
          
                      end
          
                      return if @thread.nil? || !@thread.alive?

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +123 to +124

		sleep 0.25
		next

Contributor

Strech Oct 25, 2024

Suggested change

      
                        sleep 0.25
          
                        next
          
                        next sleep(0.25)

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +142 to +145

+                    [
+                      [:status, 'probe status'],
+                      [:snapshot, 'snapshot'],
+                    ].each do |(event_type, event_name)|

Contributor

Strech Oct 25, 2024

Suggested change

      
                  [
          
                    [:status, 'probe status'],
          
                    [:snapshot, 'snapshot'],
          
                  ].each do |(event_type, event_name)|
          
                  {status: 'probe status', snapshot: 'snapshot'}.each do |event_type, event_name|

ivoanjo reviewed

View reviewed changes

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +92 to +94

+                      unless thread&.join(timeout)
+                        thread.kill
+                      end

Member

ivoanjo Oct 25, 2024

👀 This will fail if thread is nil:

[3] pry(main)> thread = nil
=> nil
[4] pry(main)> unless thread&.join(123)
[4] pry(main)*   thread.kill
[4] pry(main)* end  
NoMethodError: undefined method `kill' for nil:NilClass
from (pry):7:in `__pry__

ivoanjo reviewed

View reviewed changes

lib/datadog/di/probe_notifier_worker.rb

Comment on lines +115 to +124

+                        if io_in_progress
+                          # If we just call Thread.pass we could be in a busy loop -
+                          # add a sleep.
+                          sleep 0.25
+                          next
+                        elsif queues_empty
+                          break
+                        else
+                          sleep 0.25
+                          next

Member

ivoanjo Oct 25, 2024

It's possible to avoid the sleeping by using a condition variable to flag when the queue is empty

ivoanjo reviewed

View reviewed changes

spec/datadog/di/probe_notifier_worker_spec.rb

Comment on lines +84 to +112

+                    context 'when three snapshots are added in quick succession' do
+                      it 'sends two batches' do
+                        expect(worker.send(:snapshot_queue)).to be_empty
+                        expect(transport).to receive(:send_snapshot).once.with([snapshot])
+                        worker.add_snapshot(snapshot)
+                        sleep 0.1
+                        worker.add_snapshot(snapshot)
+                        sleep 0.1
+                        worker.add_snapshot(snapshot)
+                        # Since sending is asynchronous, we need to relinquish execution
+                        # for the sending thread to run.
+                        sleep(0.1)
+                        # At this point the first snapshot should have been sent,
+                        # with the remaining two in the queue
+                        expect(worker.send(:snapshot_queue)).to eq([snapshot, snapshot])
+                        sleep 0.4
+                        # Still within the cooldown period
+                        expect(worker.send(:snapshot_queue)).to eq([snapshot, snapshot])
+                        expect(transport).to receive(:send_snapshot).once.with([snapshot, snapshot])
+                        sleep 0.5
+                        expect(worker.send(:snapshot_queue)).to eq([])
+                      end

Member

ivoanjo Oct 25, 2024

If possible, avoid using sleeps in tests -- they make the test suite both slower and flakier >_>

p added 5 commits

October 25, 2024 11:17


          agent settings no longer used

91c1a44


          set thread to nil initially

0b08177


          standard

07fe7bc


          get rid of agent settings

bc9192f


          dependency-inject logger

cc58ee4

p-datadog added the dev/internal label


          types

b8ae0cd

codecov-commenter commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 86.93182% with 23 lines in your changes missing coverage. Please review.

Project coverage is 97.82%. Comparing base (91d883f) to head (b8ae0cd).
Report is 12 commits behind head on master.

Files with missing lines	Patch %	Lines
lib/datadog/di/probe_notifier_worker.rb	77.66%	23 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4028      +/-   ##
==========================================
- Coverage   97.86%   97.82%   -0.04%     
==========================================
  Files        1321     1324       +3     
  Lines       79326    79501     +175     
  Branches     3934     3958      +24     
==========================================
+ Hits        77631    77775     +144     
- Misses       1695     1726      +31

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

p-datadog mentioned this pull request

DEBUG-2334 dynamic instrumentation probe notification builder #4011

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels