Device::create_buffer is sometimes slow (4ms) and slows down rendering. #5984

John-Nagle · 2024-07-18T19:39:20Z

Description
Performance bottleneck found with Tracy: too much time is spent in "Device::create_buffer" and that seems to delay other thread.
This should be a fast operation, but it's taking about 4ms at times.

Repro steps

Get render-bench.. Build branch "hp" in release mode.
Run with the Tracy profiler 0.10
Zoom in on the slowest frames in the profiler.

The test creates a large number of visible objects on screen, waits 10 seconds, deletes them, waits 10 seconds, etc. So capture one full create/delete cycle.

Expected vs observed behavior
The part of the code in the profiling scope "Device::create_buffer" is 1) taking as long as 4ms, and 2) locking out some other operations in the render thread. As far as I can tell, that ought to be a fast operation.

Extra materials
Screenshot of the trace.

Full Tracy trace file:
renderbenchcreatebuffer.zip

Platform
WGPU 0.20 from crates.io
Linux 22.04 LTS.
NVidia 3070. Driver 535 (proprietary, tested)

partisani · 2024-07-18T20:42:57Z

What do you mean by Linux 22.04 LTS? The latest version of the linux kernel is 6.10

John-Nagle · 2024-07-18T20:44:56Z

Ubuntu 22.04 LTS

Wumpf · 2024-07-18T21:53:31Z

As far as I can tell, that ought to be a fast operation.

I think wgpu should do a better job documenting that it is in fact known to be a very slow operation.

That said, regardless it needs to be looked at if it really has to lock out render pass recording (or vice versa, not that it matters :)).
Wgpu is making some considerable progress in this area, so might be worth checking if the just released 22.0.0 got better in that regard, but I'd be kinda surprised if it's fundamentally different (but who knows! personally I lost a bit track of all the refactors that went in 😅).

Ideally, it would only be an "occasionally very slow" operation, i.e. whenever it actually happens to bottom out to an allocation in the driver (which shouldn't happen all that often)!

John-Nagle · 2024-07-18T23:02:26Z

just released (WGPU) 22.0.0

OK, I will upgrade all my code, and Rend3, and re-test. More tomorrow.

very slow operation.

Indeed. 4ms is slow for something in the main render loop.

cwfitzgerald · 2024-07-18T23:21:08Z

The buffer in question is the vertex buffer (you can tell by it being accessed by the mesh manager). This buffer can get very large, and large allocations can take a second for us to generate, as the underlying memory allocation takes a little bit. It shouldn't block the main thread, however.

John-Nagle · 2024-07-19T03:02:29Z

Waiting for wgpu-egui and wgpu-profiler to catch up to wgpu 22.0.0. Both have the appropriate pull requests.

John-Nagle · 2024-07-20T01:01:55Z

The buffer in question is the vertex buffer (you can tell by it being accessed by the mesh manager). This buffer can get very large, and large allocations can take a second for us to generate, as the underlying memory allocation takes a little bit. It shouldn't block the main thread, however.

Right. Profiling can show this happening, but extracting cross-thread cause and effect from profiling data is hard.

John-Nagle · 2024-07-20T04:51:13Z

The pull request to fix wgpu-profiling failed. See Wumpf/wgpu-profiler#75

A new WGPU version is needed to fix that, apparently.

github-project-automation bot added this to WebGPU for Firefox Jul 18, 2024

github-project-automation bot moved this to Todo in WebGPU for Firefox Jul 18, 2024

Wumpf added the area: performance How fast things go label Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device::create_buffer is sometimes slow (4ms) and slows down rendering. #5984

Device::create_buffer is sometimes slow (4ms) and slows down rendering. #5984

John-Nagle commented Jul 18, 2024

partisani commented Jul 18, 2024

John-Nagle commented Jul 18, 2024

Wumpf commented Jul 18, 2024

John-Nagle commented Jul 18, 2024

cwfitzgerald commented Jul 18, 2024

John-Nagle commented Jul 19, 2024

John-Nagle commented Jul 20, 2024

John-Nagle commented Jul 20, 2024

Device::create_buffer is sometimes slow (4ms) and slows down rendering. #5984

Device::create_buffer is sometimes slow (4ms) and slows down rendering. #5984

Comments

John-Nagle commented Jul 18, 2024

partisani commented Jul 18, 2024

John-Nagle commented Jul 18, 2024

Wumpf commented Jul 18, 2024

John-Nagle commented Jul 18, 2024

cwfitzgerald commented Jul 18, 2024

John-Nagle commented Jul 19, 2024

John-Nagle commented Jul 20, 2024

John-Nagle commented Jul 20, 2024