-
Notifications
You must be signed in to change notification settings - Fork 928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: dealing with data too large for a single buffer #6138
base: trunk
Are you sure you want to change the base?
Conversation
…his doesn't supprot anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long wait time for a review!
Frankly as it exists right now, we cannot accept this example. While it physically shows one strategy for dealing with large data sets, after reading it, the user doesn't get a good idea of why that strategy should be used and what problems they are avoiding, compared to the more naive strategy of using larger and larger buffers. Through inline code comments and verbiage in the readme, the reader who has no idea about any of these topics (or even the details of memory allocation) should be able to understand why this is an effective strategy to utilize.
Some things I think it should touch on:
- Large buffers may fail to allocate due to fragmentation
- Growing/shrinking a dataset with a buffer system requires copying the entire buffer contents, whereas pagenated data just requires rebuilding a bind group.
I'm not going to close this, as I do think this can be transformed into something that would be great to have.
Added a few incidental comments.
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty file? This example definitely needs tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use a bit of advice on the test(s) for this: the included shader relies on traversing the contiguous array (made from the multiple buffers) via an OFFSET
currently it needs to know this information ahead of compile time, hence the const
s.
const OFFSET: u32 = 1u << 8u;
const BUFF_LENGTH: u32 = 1u << 25u;
const NUM_BUFFERS: u32 = 2u;
const TOTAL_SIZE: u32 = BUFF_LENGTH * NUM_BUFFERS;
This is of course assuming these values would make sense for the test environment, I've chosen these values based on looking at the test output from previous runs.
I do not know of a better way in wgsl
to assign suitable values to these dynamically at runtime, as opposed to what I've done here.
If any maintainers know a cleaner, or more idiomatic way that we could tweak the shader to avoid doing this, that'd be awesome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the fragmentation and pagination notes:
-
for the fragmentation is a warning similar to what's mentioned in the doc-string on
wgpu::Limits
enough? We could link to some external docs, perhaps https://developer.nvidia.com/docs/drive/drive-os/archives/6.0.4/linux/sdk/common/topics/graphics_content/avoiding_memory_fragmentation.html -
Perhaps a pagination example is also required that this example could link to? [I would be interested in working on that]
Cheers, I'll keep working on it. |
wip: get a test going
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test.
Asked some questions.
Attempted to simplify.
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use a bit of advice on the test(s) for this: the included shader relies on traversing the contiguous array (made from the multiple buffers) via an OFFSET
currently it needs to know this information ahead of compile time, hence the const
s.
const OFFSET: u32 = 1u << 8u;
const BUFF_LENGTH: u32 = 1u << 25u;
const NUM_BUFFERS: u32 = 2u;
const TOTAL_SIZE: u32 = BUFF_LENGTH * NUM_BUFFERS;
This is of course assuming these values would make sense for the test environment, I've chosen these values based on looking at the test output from previous runs.
I do not know of a better way in wgsl
to assign suitable values to these dynamically at runtime, as opposed to what I've done here.
If any maintainers know a cleaner, or more idiomatic way that we could tweak the shader to avoid doing this, that'd be awesome.
@@ -0,0 +1 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the fragmentation and pagination notes:
-
for the fragmentation is a warning similar to what's mentioned in the doc-string on
wgpu::Limits
enough? We could link to some external docs, perhaps https://developer.nvidia.com/docs/drive/drive-os/archives/6.0.4/linux/sdk/common/topics/graphics_content/avoiding_memory_fragmentation.html -
Perhaps a pagination example is also required that this example could link to? [I would be interested in working on that]
Connections
Link to the issues addressed by this PR, or dependent PRs in other repositories
discussion
thread on matrix
Description
The aim of this new example is to demonstrate taking a large input dataset, splitting it into chunks for the purpose of moving it onto the GPU, but then treating it as a single contiguous data structure once on the GPU.
Testing
There's a single test, that runs the same call made in the example itself, which allocates what should be two buffers (on the CI GPU) worth of 0s, and then has the gpu add 1 to them.
lengths of the input and output are asserted to be equal.
contents of the returned array are asserted to all be 1.
Checklist
cargo fmt
.cargo clippy
. If applicable, add:--target wasm32-unknown-unknown
--target wasm32-unknown-emscripten
cargo xtask test
to run tests.CHANGELOG.md
. See simple instructions inside file.