[Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression #6518

rudderbucky · 2024-11-11T03:05:25Z

Description
PR #6285 introduces performance issues for wgsl loops.

Repro steps
Compare the two bevy examples here: https://github.com/rudderbucky/bevy/tree/many_0.14 and https://github.com/rudderbucky/bevy/tree/many_0.15. In particular run cargo run --features="bevy_dev_tools" --example fps_overlay_stress in both, in fullscreen via the top left green button on macOS, and compare the fps

The former uses bevy 0.14 which is on v20 and the latter is on v23. Unfortunately to prove this PR is the specific one that causes performance issues is a bit more involved - you will need to clone the wgpu repo to checkout the PR commit, clone the naga_oil and clone the bevy repo to point all wgpu/naga/naga-oil dependencies to their local versions (in both bevy and naga_oil). Sadly a breaking change in wgpu causes bevy to stop compiling, though the fix to test the above examples is as simple as commenting out the set_bind_group lines that are causing issues.

Expected vs observed behavior
Performance decreases drastically between wgpu commits 3fda684 (PR 6285) and the one before.

Platform
Reproduced on M1 Pro 16 inch MBP and M1 Mac mini

The text was updated successfully, but these errors were encountered:

alice-i-cecile · 2024-11-11T03:52:00Z

#6285 by @jimblandy is the PR in question. I remember you talking about this technique and I'm very surprised to see it have severe performance implications.

Of note, the perf regression (and infinite loop fix that caused it) are both on Mac as far as I understand. Reproduction on other platforms would likely be useful.

jimblandy · 2024-11-11T04:51:20Z

First of all, thank you very much @rudderbucky for tracking this down! Isolating it to a particular commit is very helpful.

That commit shouldn't have any effect on any platform other than Metal. The only change is to the Metal Shading Language that Naga generates for loops.

The LOOP_IS_REACHABLE kludge is actually only needed for running untrusted code, as in a browser. I'll bet that most people writing native apps control their shaders (i.e., they're not writing shader playgrounds), and probably don't care if the MSL compiler does strange things when the shader has an infinite loop - they'll just remove the infinite loop.

We want to prefer safe defaults until we know what their runtime cost is. @rudderbucky, how much of a slowdown are you seeing? You say it's "drastic", and I believe you, but let's get some numbers into the issue.

rudderbucky · 2024-11-11T04:55:18Z

@jimblandy FWIW gating the feature to browsers would work perfectly for me because I am running native. On my MBP I am seeing in the example a drop from ~200FPS to 75FPS. Here's another set of numbers from a different tester, who seemed to have more problems on HiDPI display settings:

M1 Mac mini, Sonoma 14.5

many_0.15
312FPS: Release 1280x720
268FPS: Debug 1280x720
 76FPS: Release 1280x720 (HiDPI)
 76FPS: Debug 1280x720 (HiDPI)

many_0.14
1230FPS: Release 1280x720
 340FPS: Debug 1280x720
 550FPS: Release 1280x720 (HiDPI)
 301FPS: Debug 1280x720 (HiDPI)

jimblandy · 2024-11-11T05:04:10Z

That impact is serious enough to merit a workaround.

To confirm that the LOOP_IS_REACHABLE kludge is the problem, could you try applying the following patch to the Naga sources, and see if it fixes the performance?

diff --git a/naga/src/back/msl/writer.rs b/naga/src/back/msl/writer.rs
index 83d937d48..d776e7c37 100644
--- a/naga/src/back/msl/writer.rs
+++ b/naga/src/back/msl/writer.rs
@@ -803,14 +803,7 @@ impl<W: Write> Writer<W> {
         }
 
         self.loop_reachable_macro_name = self.namer.call("LOOP_IS_REACHABLE");
-        let loop_reachable_volatile_name = self.namer.call("unpredictable_jump_over_loop");
-        writeln!(
-            self.out,
-            "#define {} if (volatile bool {} = true; {})",
-            self.loop_reachable_macro_name,
-            loop_reachable_volatile_name,
-            loop_reachable_volatile_name,
-        )?;
+        writeln!(self.out, "#define {}", self.loop_reachable_macro_name,)?;
 
         Ok(())
     }

If that brings performance back up, then we can be confident that it's worth making this conditional.

rudderbucky · 2024-11-11T05:09:20Z

@jimblandy yep just confirmed that doing that fixes performance on my end back to ~200FPS.

jimblandy · 2024-11-11T05:51:39Z

Okay, then I think we've caught the culprit.

I think the fix is to have Naga give that macro an empty definition unless some sort of bounds checks are enabled. Then native apps should use create_shader_module_unchecked to avoid the performance impact.

rudderbucky · 2024-11-11T16:22:00Z

@jimblandy err just read this... I took the naive route and just gated the code you mentioned to wasm32 in #6520. Let me know if you'd like it done the way you mentioned.

github-project-automation bot moved this to Todo in WebGPU for Firefox Nov 11, 2024

github-project-automation bot added this to WebGPU for Firefox Nov 11, 2024

rudderbucky mentioned this issue Nov 11, 2024

Performance regression between 0.15 rc2 and rc3 bevyengine/bevy#16315

Open

Wumpf changed the title ~~[Metal] PR 6285 introduces performance issues~~ [Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression Nov 11, 2024

rudderbucky linked a pull request Nov 11, 2024 that will close this issue

Gate MSL infinite loop optimization workaround to a flag enabled by default #6520

Open

b0nes164 mentioned this issue Nov 12, 2024

[naga]Naga does not generate for loops #6521

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression #6518

[Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression #6518

rudderbucky commented Nov 11, 2024 •

edited by Wumpf

Loading

alice-i-cecile commented Nov 11, 2024 •

edited

Loading

jimblandy commented Nov 11, 2024

rudderbucky commented Nov 11, 2024 •

edited

Loading

jimblandy commented Nov 11, 2024

rudderbucky commented Nov 11, 2024

jimblandy commented Nov 11, 2024

rudderbucky commented Nov 11, 2024

[Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression #6518

[Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression #6518

Comments

rudderbucky commented Nov 11, 2024 • edited by Wumpf Loading

alice-i-cecile commented Nov 11, 2024 • edited Loading

jimblandy commented Nov 11, 2024

rudderbucky commented Nov 11, 2024 • edited Loading

jimblandy commented Nov 11, 2024

rudderbucky commented Nov 11, 2024

jimblandy commented Nov 11, 2024

rudderbucky commented Nov 11, 2024

rudderbucky commented Nov 11, 2024 •

edited by Wumpf

Loading

alice-i-cecile commented Nov 11, 2024 •

edited

Loading

rudderbucky commented Nov 11, 2024 •

edited

Loading