Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression #6518

Open
rudderbucky opened this issue Nov 11, 2024 · 7 comments · May be fixed by #6520

Comments

@rudderbucky
Copy link

rudderbucky commented Nov 11, 2024

Description
PR #6285 introduces performance issues for wgsl loops.

Repro steps
Compare the two bevy examples here: https://github.com/rudderbucky/bevy/tree/many_0.14 and https://github.com/rudderbucky/bevy/tree/many_0.15. In particular run cargo run --features="bevy_dev_tools" --example fps_overlay_stress in both, in fullscreen via the top left green button on macOS, and compare the fps

The former uses bevy 0.14 which is on v20 and the latter is on v23. Unfortunately to prove this PR is the specific one that causes performance issues is a bit more involved - you will need to clone the wgpu repo to checkout the PR commit, clone the naga_oil and clone the bevy repo to point all wgpu/naga/naga-oil dependencies to their local versions (in both bevy and naga_oil). Sadly a breaking change in wgpu causes bevy to stop compiling, though the fix to test the above examples is as simple as commenting out the set_bind_group lines that are causing issues.

Expected vs observed behavior
Performance decreases drastically between wgpu commits 3fda684 (PR 6285) and the one before.

Platform
Reproduced on M1 Pro 16 inch MBP and M1 Mac mini

@alice-i-cecile
Copy link

alice-i-cecile commented Nov 11, 2024

#6285 by @jimblandy is the PR in question. I remember you talking about this technique and I'm very surprised to see it have severe performance implications.

Of note, the perf regression (and infinite loop fix that caused it) are both on Mac as far as I understand. Reproduction on other platforms would likely be useful.

@jimblandy
Copy link
Member

First of all, thank you very much @rudderbucky for tracking this down! Isolating it to a particular commit is very helpful.

That commit shouldn't have any effect on any platform other than Metal. The only change is to the Metal Shading Language that Naga generates for loops.

The LOOP_IS_REACHABLE kludge is actually only needed for running untrusted code, as in a browser. I'll bet that most people writing native apps control their shaders (i.e., they're not writing shader playgrounds), and probably don't care if the MSL compiler does strange things when the shader has an infinite loop - they'll just remove the infinite loop.

We want to prefer safe defaults until we know what their runtime cost is. @rudderbucky, how much of a slowdown are you seeing? You say it's "drastic", and I believe you, but let's get some numbers into the issue.

@rudderbucky
Copy link
Author

rudderbucky commented Nov 11, 2024

@jimblandy FWIW gating the feature to browsers would work perfectly for me because I am running native. On my MBP I am seeing in the example a drop from ~200FPS to 75FPS. Here's another set of numbers from a different tester, who seemed to have more problems on HiDPI display settings:

M1 Mac mini, Sonoma 14.5

many_0.15
312FPS: Release 1280x720
268FPS: Debug 1280x720
 76FPS: Release 1280x720 (HiDPI)
 76FPS: Debug 1280x720 (HiDPI)

many_0.14
1230FPS: Release 1280x720
 340FPS: Debug 1280x720
 550FPS: Release 1280x720 (HiDPI)
 301FPS: Debug 1280x720 (HiDPI)

@jimblandy
Copy link
Member

That impact is serious enough to merit a workaround.

To confirm that the LOOP_IS_REACHABLE kludge is the problem, could you try applying the following patch to the Naga sources, and see if it fixes the performance?

diff --git a/naga/src/back/msl/writer.rs b/naga/src/back/msl/writer.rs
index 83d937d48..d776e7c37 100644
--- a/naga/src/back/msl/writer.rs
+++ b/naga/src/back/msl/writer.rs
@@ -803,14 +803,7 @@ impl<W: Write> Writer<W> {
         }
 
         self.loop_reachable_macro_name = self.namer.call("LOOP_IS_REACHABLE");
-        let loop_reachable_volatile_name = self.namer.call("unpredictable_jump_over_loop");
-        writeln!(
-            self.out,
-            "#define {} if (volatile bool {} = true; {})",
-            self.loop_reachable_macro_name,
-            loop_reachable_volatile_name,
-            loop_reachable_volatile_name,
-        )?;
+        writeln!(self.out, "#define {}", self.loop_reachable_macro_name,)?;
 
         Ok(())
     }

If that brings performance back up, then we can be confident that it's worth making this conditional.

@rudderbucky
Copy link
Author

@jimblandy yep just confirmed that doing that fixes performance on my end back to ~200FPS.

@jimblandy
Copy link
Member

Okay, then I think we've caught the culprit.

I think the fix is to have Naga give that macro an empty definition unless some sort of bounds checks are enabled. Then native apps should use create_shader_module_unchecked to avoid the performance impact.

@Wumpf Wumpf changed the title [Metal] PR 6285 introduces performance issues [Metal/MSL] Workaround for stopping infinite loops to be optimized out causes performance regression Nov 11, 2024
@rudderbucky
Copy link
Author

@jimblandy err just read this... I took the naive route and just gated the code you mentioned to wasm32 in #6520. Let me know if you'd like it done the way you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

3 participants