Replies: 16 comments
-
@jongreenberg-ea mentioned a related issue filed against DXC w.r.t. I think this raises a good point, code composability issues are exacerbated in a world with work graphs. |
Beta Was this translation helpful? Give feedback.
-
the way we solve that is via an There's a funny bug in codegen in DXC which causes |
Beta Was this translation helpful? Give feedback.
-
in spir-v land there's a capability/extension |
Beta Was this translation helpful? Give feedback.
-
I don't think this relevant to this proposal, which I would rather keep to a minimal surface area if possible (lambdas feel like an entirely separate matter, which would require some form of reference semantics anyways).
This is probably more relevant to the proposal related to |
Beta Was this translation helpful? Give feedback.
-
Functor structs, a.k.a. the precursors to C++11 lambdas, work just fine for us. |
Beta Was this translation helpful? Give feedback.
-
Please show a snippet of code where in the function body, you are able to perform an interlocked compare exchange on data passed into the function that may either be LDS memory or memory in a structured buffer or some other buffer type. I can imagine code contortions one might need to make this possible, but restructuring all code to become templates or perform interlocked operations through proxy objects is pretty undesirable in a general case. |
Beta Was this translation helpful? Give feedback.
-
Voila: https://godbolt.org/z/9qso3Y7eW see it emits:
its a codegen bug, but it works... for now |
Beta Was this translation helpful? Give feedback.
-
That isn't what I asked for unfortunately -- I thought it was pretty obvious I was replying to your somewhat snarky "work just fine for us" comment. As for the codegen bug, obviously this should not be relied on, and is better discussed as an actual DXC issue? |
Beta Was this translation helpful? Give feedback.
-
Well its not my fault if you can't ask the question plainly and clearly.
There's already a DXC issue, but unlikely to get "fixed" until references are in. |
Beta Was this translation helpful? Give feedback.
-
Also look, same function takes either shared or RWStructuredBuffer |
Beta Was this translation helpful? Give feedback.
-
If you scroll up, I am replying directly to this claim:
I'm not particularly interested in engaging in further discussion with you but with respect to:
Yes, obviously you can modify the snippet to exploit the same codegen bug, but this is not the same as demonstrating a working snippet working within intended language semantics using your functor struct method. |
Beta Was this translation helpful? Give feedback.
-
I have a whole library of them. Write exactly what you want and I can give you an example. |
Beta Was this translation helpful? Give feedback.
-
DO NOT rely on this. If you do your code will break in the near future. I have a draft PR to fix this issue in the SPIR-V code generator and fix a bunch of related issues in the DXIL code generator (microsoft/DirectXShaderCompiler#5249). The PR is currently blocked on resolving some related issues with matrix orientation annotations, but we are hoping to get both of those fixes into a DXC release this fall. |
Beta Was this translation helpful? Give feedback.
-
@devshgraphicsprogramming, some of your comments on this thread (and others), are pretty snarky and contain non-productive language. I think you've brought up a lot of great feedback across these discussions, but it would be better if you could restrain the snark. This project (and DXC) are both governed under the Microsoft Open Source Code of Conduct, and some of your comments here are getting awfully close to crossing into unacceptable territory. I ask that you please focus on being kind, respectful and constructive in your comments. |
Beta Was this translation helpful? Give feedback.
-
Sure thing, its a street that goes both ways. I can obviously see what's wrong with my second comment as I was obviously unkind, I don't get the issue with the first and the last. I'm not poking fun at anyone by using "fixed" in quotation marks, I'm just conveying that I like it being "broken" :D |
Beta Was this translation helpful? Give feedback.
-
I was actually trying to write a https://godbolt.org/z/8Ehv9e46W Furthermore it would never be a good solution because I don't know what offset the This is something much simpler to track with SPIR-V codegen, because that can actually track the types It would make sense for the HLSL compiler to introduce a native Workgroup pointer type, and define whether:
|
Beta Was this translation helpful? Give feedback.
-
This is a general issue I wanted to file to encourage open discussion about future improvements to
groupshared
LDS/shmem usage and allocation.Problem Discussion
This is unlikely exhaustive, but here are the problems that are likely to crop up for users of
groupshared
memory (I will usegroupshared
, LDS, and shmem interchangeably).WaveGetLaneCount
is not a compile-time constant, the wave size cannot readily be used in agroupshared
declaration. The current workaround for this is to either make (bad) assumptions about the wave size, or produce multiple specializations of each shader and dispatching the correct one with a matchingWaveSize
at runtime. Neither option is ideal, with the former resulting in brittle hardware-specific code, and the latter resulting in build and runtime complexity.groupshared
data must be declared in the global declaration context. This impedes code composability -- for example, if we wanted to include a header to use a function defined in that header, we may hurt occupancy by inadvertently dragging alonggroupshared
declarations. This is a real footgun in larger (and sometimes smaller) codebases, and it is difficult to detect when it occurs (or at least, it takes some work to understand why occupancy is lower than expected).groupshared
variable as declared, due to the lack of a user-accessibleref
function parameter qualifier. A function that operates over some input data and exports output data cannot always rely on "copy-in" and "copy-out" semantics, because thein
andout
semantics do not permit usage of the variousInterlocked*
functions (which internally are modeled usingref
-qualified parameters)Possible Solutions
If possible, I think there are a few things that would immediately improve quality-of-life for compute shader authors. These suggestions are written from the perspective of an ISV (the shader writer), and it's understood that other solutions may end up being preferable due to practicality, performance, ease-of-implementation, or all of the above from the perspective of a hardware vendor or DXC compiler implementer.
WaveGetLaneCount
in the declaration type forgroupshared
storage.groupshared
.ref
keyword that would permit use ofInterlocked*
intrinsics for ref-qualified parametersThe first item would allow developers to conceptually treat
WaveGetLaneCount
as aconstexpr
function, whose value is realized only when a PSO is actually created at runtime. This has implications beyond LDS allocation, but would be a very useful tool in the toolbox for other use cases.For the second item, because functions are still fully unrolled currently, the total storage needed per-thread-group for a given compute shader should still be statically known, although DXIL may require modifications to properly alias types allocated from the virtual shmem pool. The idea here is that a static analysis pass would determine the amount of LDS memory needed in the "middle swell" of the program, accounting for all possible branches taken where
groupshared
variables are declared.The counterargument to the second item is that statically knowing how much LDS is used precludes future HLSL code in a world where function calling is possible. At this point, one option would be to permit functions to allocate LDS (similar to
alloca
) using the same semantics as locally declaredgroupshared
variables. The driver would need to be able to suspend thread groups if LDS isn't available, or possibly demote allocated LDS to slower vram (possibly from a fixed size pool of reserved memory).The last item addresses the ability to perform operations on memory in LDS, regardless of where or how that LDS memory was allocated.
All that said, my main goal is to encourage discussion, and not attempt to be overly prescriptive about the solutions. I think starting from a well-defined problem statement is likely step one.
Beta Was this translation helpful? Give feedback.
All reactions