Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize allowlist handling #3804

Merged
merged 2 commits into from
Sep 20, 2024
Merged

Conversation

robinkar
Copy link
Contributor

Thanks to the benchmark setup provided by Jeff in #3026, we were able to spend some time attempting to optimize the file browser file listing.
These optimizations were low hanging fruit with a decent impact on the runtime. It now returns early if no allowlist is defined (always allowed), and child paths are now determined by using strings rather than Pathname.

With an allowlist (/users:/scratch:/projappl, listing files in /tmp), there was a 2x speedup (50k files, 5 runs, 100s => 50s). Without an allowlist, the speedup was >3x (50k files, 5 runs, 49s => 14s).
The speedup depends heavily on the situation though, and the listing is only a part of the problem, as seen in #3801.
When manually testing the performance in a browser, the response time listing a directory with 50k files with no allowlist went from ~18s to ~13s.

Benchmarks

We decided to slightly customize Jeff's benchmark setup as follows:

Benchmark setup
diff --git a/apps/dashboard/Gemfile b/apps/dashboard/Gemfile
index 113d021f..e812039e 100644
--- a/apps/dashboard/Gemfile
+++ b/apps/dashboard/Gemfile
@@ -79,3 +79,5 @@ gem "sinatra-contrib", require: false
 gem "erubi", require: false
 gem "dalli", require: false
 
+gem 'rails-perftest', :git => 'https://github.com/rails/rails-perftest.git'
+gem 'ruby-prof'
diff --git a/apps/dashboard/test/performance/posix_file_test.rb b/apps/dashboard/test/performance/posix_file_test.rb
new file mode 100644
index 00000000..0c536847
--- /dev/null
+++ b/apps/dashboard/test/performance/posix_file_test.rb
@@ -0,0 +1,39 @@
+require 'test_helper'
+require 'rails/performance_test_help'
+
+N_RUNS = ENV.fetch("N_RUNS", 30).to_i
+N_FILES = ENV.fetch("N_FILES", 300).to_i
+N_CACHE_USES = ENV.fetch("N_CACHE_USES", 0).to_i
+
+FILES_PATH = ENV.fetch("FILES_PATH", "/tmp/file-profile-test")
+OUTPUT_PATH = ENV.fetch("OUTPUT_PATH", "/tmp/performance")
+
+class PosixFileTest < ActionDispatch::PerformanceTest
+  # Refer to the documentation for all available options #[:wall_time, :memory, :cpu_time, :process_time],
+  self.profile_options = { runs: N_RUNS, metrics: [:wall_time],
+                           output: OUTPUT_PATH, formats: [:flat, :graph_html, :call_tree, :call_stack] }
+
+  def setup
+    @dir = FILES_PATH
+    if N_CACHE_USES == 0
+      Dir.mkdir(@dir)
+      Dir.chdir(@dir) do
+        (0...N_FILES).each { |num| `touch foo_#{num}.txt` }
+      end
+    end
+
+    @mem = ActiveSupport::Cache::MemoryStore.new
+    Rails.stubs(:cache).returns(@mem)
+  end
+
+  def teardown
+    FileUtils.rm_rf(@dir) unless N_CACHE_USES > 0
+  end
+
+  test "listing" do
+    f = PosixFile.new(@dir)
+    for _ in 0..(N_CACHE_USES) do
+      f.ls
+    end
+  end
+end
The benchmarks without allowlist were run as:
bin/rake test N_RUNS=5 N_FILES=50000 TEST=test/performance/posix_file_test.rb

The benchmarks with allowlist were run as:

bin/rake test OOD_ALLOWLIST_PATH="/users:/scratch:/projappl" N_RUNS=5 N_FILES=50000 TEST=test/performance/posix_file_test.rb

No allowlist, before PR:
no_allowlist_unopt

No allowlist, after PR:
no_allowlist_opt

With allowlist, before PR:
allowlist_unopt

With allowlist, after PR:
allowlist_opt

Note that these benchmarks always run with an empty allowlist cache, i.e. the time taken by ActiveSupport::Cache::Store#fetch includes evaluating the new cache value. N_CACHE_USES in the benchmarks can be increased to see the effect when the cache is actually used.

- Return early if allowlist is disabled.
- Handle child path evaluation as strings.
# Child path shorter than parent => not a child.
return false if c.size < p.size
# Child path is same as parent path, or has additional components after parent path (has "/" after parent path).
c[0...p.size] == p && (c.size == p.size || c[p.size] == "/")
Copy link
Contributor

@johrstrom johrstrom Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has also bee no my mind lately. Isn't this whole expression c.start_with?(p)? Looks like start_with is a c function so it may be even faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true indeed. Didn't seem to be faster, but it is simpler for sure.

@CSC-swesters CSC-swesters mentioned this pull request Sep 20, 2024
@johrstrom johrstrom closed this Sep 20, 2024
@johrstrom johrstrom reopened this Sep 20, 2024
@johrstrom johrstrom merged commit a40d82a into OSC:master Sep 20, 2024
46 of 47 checks passed
johrstrom pushed a commit that referenced this pull request Sep 23, 2024
* Optimize allowlist handling

- Return early if allowlist is disabled.
- Handle child path evaluation as strings.

* Use start_with?
johrstrom added a commit that referenced this pull request Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants