-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device::create_buffer is sometimes slow (4ms) and slows down rendering. #5984
Comments
What do you mean by Linux 22.04 LTS? The latest version of the linux kernel is 6.10 |
Ubuntu 22.04 LTS |
I think wgpu should do a better job documenting that it is in fact known to be a very slow operation. That said, regardless it needs to be looked at if it really has to lock out render pass recording (or vice versa, not that it matters :)). Ideally, it would only be an "occasionally very slow" operation, i.e. whenever it actually happens to bottom out to an allocation in the driver (which shouldn't happen all that often)! |
OK, I will upgrade all my code, and Rend3, and re-test. More tomorrow.
Indeed. 4ms is slow for something in the main render loop. |
The buffer in question is the vertex buffer (you can tell by it being accessed by the mesh manager). This buffer can get very large, and large allocations can take a second for us to generate, as the underlying memory allocation takes a little bit. It shouldn't block the main thread, however. |
Waiting for wgpu-egui and wgpu-profiler to catch up to wgpu 22.0.0. Both have the appropriate pull requests. |
Right. Profiling can show this happening, but extracting cross-thread cause and effect from profiling data is hard. |
The pull request to fix wgpu-profiling failed. See Wumpf/wgpu-profiler#75 A new WGPU version is needed to fix that, apparently. |
Description
Performance bottleneck found with Tracy: too much time is spent in "Device::create_buffer" and that seems to delay other thread.
This should be a fast operation, but it's taking about 4ms at times.
Repro steps
The test creates a large number of visible objects on screen, waits 10 seconds, deletes them, waits 10 seconds, etc. So capture one full create/delete cycle.
Expected vs observed behavior
The part of the code in the profiling scope "Device::create_buffer" is 1) taking as long as 4ms, and 2) locking out some other operations in the render thread. As far as I can tell, that ought to be a fast operation.
Extra materials
Screenshot of the trace.
Full Tracy trace file:
renderbenchcreatebuffer.zip
Platform
WGPU 0.20 from crates.io
Linux 22.04 LTS.
NVidia 3070. Driver 535 (proprietary, tested)
The text was updated successfully, but these errors were encountered: