From f0be55416f2641f9c30a09f740e7c8b0dd568c76 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Thu, 4 Jul 2024 13:36:00 +0100 Subject: [PATCH] [NO-TICKET] Optimize CodeProvenance#record_loaded_files to avoid allocations **What does this PR do?** This PR replaces a `Hash#find` with a `Hash#any?` in the Profiler `CodeProvenance` collector. This seemingly innocuous change actually gets rid of a bunch of memory allocations. It turns out that when `Hash#find` is being used, Ruby allocates an array to contain the `[key, value]` pair that's passed into the block, for each entry in the hash (on top of a few more objects -- but this is the biggest offender unless the hash is empty/almost empty). The VM actually has an optimization for a lot of operations, including `Hash#any?` that avoids allocating any extra memory, but because `Hash#find` is actually inherited from `Enumerable`, this optimization does not kick in. **Motivation:** Eliminate unneeded memory allocations. **Additional Notes:** Here's a very simple reproducer that shows the issue: ``` puts RUBY_DESCRIPTION def allocated_now; GC.stat(:total_allocated_objects); end hash = {a: 1, b: 2, c: 3} v = nil before_find = allocated_now 10.times { _, v = hash.find { |k, _| k == :c } } puts "Found value #{v}, allocated #{allocated_now - before_find}" before_any = allocated_now 10.times { v = nil; hash.any? { |k, val| (v = val) if k == :c } } puts "Found value #{v}, allocated #{allocated_now - before_any}" ``` and here's how it looks on the latest Ruby version: ``` ruby 3.4.0preview1 (2024-05-16 master 9d69619623) [x86_64-linux] Found value 3, allocated 64 Found value 3, allocated 1 ``` (Note that there's a few more allocations going on with `Hash#find`, not only the arrays for the pairs, and we get rid of them ALL!) **How to test the change?** This operation already has test coverage. I considered adding some counting of allocated objects, but we've had quite a bunch of flakiness in some of the profiler specs when counting objects, so I've decided to not add any performance-specific tests for this. --- lib/datadog/profiling/collectors/code_provenance.rb | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/datadog/profiling/collectors/code_provenance.rb b/lib/datadog/profiling/collectors/code_provenance.rb index 239a5b4e990..ea795f9526e 100644 --- a/lib/datadog/profiling/collectors/code_provenance.rb +++ b/lib/datadog/profiling/collectors/code_provenance.rb @@ -92,8 +92,10 @@ def record_loaded_files(loaded_files) seen_files << file_path - _, found_library = libraries_by_path.find { |library_path, _| file_path.start_with?(library_path) } - seen_libraries << found_library if found_library + # NOTE: Don't use .find, it allocates a lot more memory (see commit that added this note for details) + libraries_by_path.any? do |library_path, library| + seen_libraries << library if file_path.start_with?(library_path) + end end end