Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vk::BufferPointer to HLSL #37

Merged
merged 2 commits into from
Apr 14, 2023
Merged

Add vk::BufferPointer to HLSL #37

merged 2 commits into from
Apr 14, 2023

Conversation

greg-lunarg
Copy link
Contributor

@greg-lunarg greg-lunarg commented Apr 4, 2023

Create a new first class type vk::BufferPointer which allows users to efficiently reference buffer device addresses and allows tools to analyze and report on usage in a logical rather than physical way i.e. names rather than numeric offsets.


float4 MainPs(void) : SV_Target0
{
float4 vTest = g_PushConstants.m_nBufferDeviceAddress.g_vTestFloat4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accessing syntax is irregular here. A vk::buffer_ptr<T> can’t be treated as member accessing T. I think there are two options:

(1) Add a Get method to vk::buffer_ptr which returns a reference. This would be consistent with some of the existing buffer accessors, and provides regular syntax which can be supported for all HLSL versions.
(2) For HLSL 202x a dereference (->) operator could be added to indirect to the members.

Copy link
Contributor Author

@greg-lunarg greg-lunarg Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HLSL 202x a dereference (->) operator could be added to indirect to the members.

Does that mean that "->" cannot be added today? If not, what approximate calendar date could this be added and released?

Copy link
Contributor Author

@greg-lunarg greg-lunarg Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding get(), would the above line then be?:

float4 vTest = g_PushConstants.m_nBufferDeviceAddress.get().g_vTestFloat4;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The support for dereference operators is tied to adding references, which hasn’t yet been done. HLSL 202x doesn’t have a schedule yet, but we’re already starting to do some preliminary work on it.

Yes, that is how the Get() method should work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to have the full reference support to use ->?
For now parser seems to ignore MemberExpr->isArrow() and allow . and -> (isArrow is just always false), and then fails in
tools/clang/lib/Sema/SemaExprCXX.cpp

Could we check the type here and allow arrow operators for this type without having the full reference implem?
This this feature can being used with the proper syntax, and then real reference support can roll-in without disturbances?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t need to have references fully implemented to add -> and * dereference operators to built-in types, but those operators will be HLSL 202x features, not available in the older language versions.

Right, just unsure how the release of the HLSL 202x relates to this. Since we add a new type to the language, how would that be different from adding allowing the -> operator for this type? Is it because you don't want to have the operator being tied to a specific type for now instead of a more generic semantic?

Agree with Greg, as the get() does seem to imply we load the full struct, and then hope for the optimizer to only load 1 field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me get() simply reminds me of a smart pointer in C++ where you retrieve the underlying weak pointer. I don't necessarily think of the pointed-to object being loaded. Just offering a different perspective, but I also see your point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reverse order:

Agree with Greg, as the get() does seem to imply we load the full struct, and then hope for the optimizer to only load 1 field.

I don't think this is a concern we should design around. As @vettoreldaniele pointed out this is common syntax for C++ smart pointers so our users will be familiar with it, also the existing HLSL resource types that support subscript operators and Load methods already have the behavior where a load of the full object is not implied by the object being returned. It is 100% the expectation of our users that even if a subscript or Load method is used to store the full object to a local temporary, unused fields are optimized away and their loads removed from the final optimized code.

Right, just unsure how the release of the HLSL 202x relates to this. Since we add a new type to the language, how would that be different from adding allowing the -> operator for this type? Is it because you don't want to have the operator being tied to a specific type for now instead of a more generic semantic?

Is there a specific Vulkan/SPIR-V version requirement that this is gated on?

In general for HLSL we have two revisioning axis: language version and runtime version. In the past we've kinda blurred the two and it has caused a lot of problems. We're trying to be much more explicit about the separation.

Language versioning changes the core language of HLSL: syntax, grammar, semantics, etc.

Runtime version (for DXIL this is Shader Models) changes the library functionality of the language: data types, methods, etc.

There's an overlap here when you talk about adding something like a dereference operator which does not exist in the language. In this case you are adding a data type to the runtime version, and a new language construct on the language side. Both need to be versioned.

Having a Get method provides a syntax to use the object which is independent of the language version, so it can be exposed across all HLSL language versions supported by DXC.

From the specification perspective, HLSL 2021 is fully baked, done and out the door. We don't have a formal language spec for HLSL 2021 (although I have been writing one), but if we had a complete spec it would contain no reference to dereference operators because they don't exist in HLSL 2021 (or earlier versions). Introducing a new operator syntax is a change to the language and thus must be tied to a language version. Users should be able to know what syntaxes are supported by a compiler based on knowing what language version(s) the compiler supports. For a non-HLSL analogy: you wouldn't expect a C++98 compiler to support Lambdas, you shouldn't expect a HLSL 2021 compiler to support dereferences.

From an implementation perspective things are a little different and more nuanced. We have in the past exposed limited functionalities of new language modes in older language modes. A most recent example was the addition of the select, and and or built-in functions in versions older than HLSL 2021. We did that to ease developer transition by allowing valid HLSL 2021 code to be valid in earlier versions. Those intrinsics also did not introduce new syntax or grammar constructs. As another example, we did not expose HLSL 2021 templates in HLSL 2018 or earlier. templates did introduce new syntaxes. Because this is a completely new syntax I'm not sure this is good candidate to expose in earlier HLSL versions.

If the implementation is sufficiently simple, robust and doesn't rely on intrusive language changes, or expose other breakages in language semantics we could consider exposing it as an extension to earlier versions of HLSL. I've previously advocated internally for allowing language extensions when they are non-breaking, strictly additive changes and have a compatibility diagnostic implemented (meaning the compiler emits a warning if you use the feature while setting an old language version). That might be an option here, but I wouldn't want to commit to it until we get into the implementation and can fully explore the interactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I am fine with get().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation!
for get(), missed the parallel with shared pointers. Good argument, fine with get().

Thanks for the clarification on the frontier between HLSL version and shader model version. Those arguments make sense 😊


### Differences from C++ Pointers

vk::buffer_ptr is different from a C++ pointer in that a selection “.” operation can and must be applied to a buffer pointer to de-reference it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this conflicts with the future directions of HLSL, so it would be better to add a more forward looking syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


### Buffer Pointer Target Alignment

The target alignment `A` of `vk::buffer_ptr(T,A)` must be at least as large as the largest basic type in the buffer reference struct `T`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure this is correct. Is the intent here to match C alignment rules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just something I deduced based on my understanding of the overall feature.

When you say you think this is not correct, are you thinking:

  1. A cannot be as small as the largest basic type alignment in the buffer, or

  2. A can be smaller than the largest basic type alignment in the buffer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On many architectures alignments aren’t strictly tied to the sizes of basic types. For example, what is the required alignment for a float3?

If I have a float3 inside a struct, what is the alignment requirement of the struct?

In C a struct’s alignment is defined as the alignment of the most restrictive member (which is not the size of the member).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Greg intended was correct, but is not worded clearly.

Correct me if I am wrong, but I think Greg meant to say:

The target alignment A of vk::buffer_ptr(T,A) must be at least as large as the largest alignment of a basic type in the buffer reference struct T.

I believe this is the same as what Chris mentioned:

In C a struct’s alignment is defined as the alignment of the most restrictive member

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s definitely part of it.

Is the expected alignment based on the CPU or GPU data layout rules? The HLSL buffer packing rules don’t match C struct packing rules, so we probably need to be a bit explicit here about what is expected (see: https://github.com/microsoft/DirectXShaderCompiler/wiki/Buffer-Packing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Steven is right about my intent to refer to alignment.

I think part of the problem is my use of basic type. For example, for a float3, I consider the basic type is float. Is there a better term here? Elemental?

So, one implication of my statement is: if the buffer_ptr pointed to a struct with float3, the alignment A must be at least 4 bytes. Otherwise the compiler would give an error.

Practically, the way I use A is that I take alignment that the compiler gives me for the buffer member given the packing that it has applied to the buffer, then when I generate the SPIR-V OpLoad I use the minimum of that alignment and A.

Does this clear up all concerns?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're saying makes sense, but we need to get language into the spec that reflects the required alignment constraints.

* There is no equivalent to the `*` operator that will dereference the entire struct.
* There is no conversion from uint64_t to vk:buffer_ptr.
* There is no explicit pointer arithmetic. All addressing is implicitly done using the `.` pointer, or indexing an array in the struct T.
* The comparison operators == and != are not supported for buffer pointers. Comparison to null pointer should be done by first converting to uint64_t.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we convert to uint64_t for comparison to null or should we support a boolean conversion for validity?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not being able to assign it from an uint64_t: completely agree.
But why would == and != not be supported? Or nullptr != my_pointer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fair. We can't do anything with the int. We did this because that is the idiom the spir-v will have to use, but we can have something cleaner in HLSL.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure how I feel about introducing a nullptr construct… I need to think on that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like the boolean conversion idea or we could add a member function IsNull(). I find the member function more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason for the convert to uint64_t is for the null check, so replacing the convert with an IsNull() method would be acceptable. It would also remove the dependency on the availability of uint64_t.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think IsNull() is much cleaner than supporting conversion to uint64_t. We really don't want/need to expose the address value directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fine with IsNull(). Just curious about any future divergence: cuda/opencl have pointers and != NULL, so why not other shaders in a few years? Using a bool operator with !myBufferPointer at least allows us be "compatible" with a more native pointer syntax, if we need one in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've ever made convergence with cuda/opencl a goal for HLSL. That said, you're probably right that we will want to support pointers in the future in HLSL.

The problem with a bool conversion operator is actually an existing issue in HLSL. Overloaded operators for user defined types cannot be used for Implicit conversions (there's a bug tracking this). I posted a fly-by comment on that bug back in February because I think I spotted the problem, but I'm unsure if that can be safely changed in HLSL 2021.

If it can't be fixed safely in HLSL 2021, then a conversion operator would also need to be part of the HLSL 202x feature set where I have more extensive fixes for overload resolution planned.

@llvm-beanz
Copy link
Collaborator

As a general comment, please update the PR description with a meaningful description.

@llvm-beanz
Copy link
Collaborator

I think we should also consider renaming the type to BufferPointer, which is more consistent with HLSL’s design. Curious if @s-perron or @Keenuts have thoughts on that.

@Keenuts
Copy link
Collaborator

Keenuts commented Apr 5, 2023

I think we should also consider renaming the type to BufferPointer, which is more consistent with HLSL’s design. Curious if @s-perron or @Keenuts have thoughts on that.

You mean moving it outside of the vk:: namespace or is is just about snake vs camel?
If it's moving it outside of the vk:: namespace, yes, as long this is the kind of feature DX would/could support at some point in the future.

If it's only the case, sure, as all samplers/texture types are already camel-case.

Copy link
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks good. It will be a useful feature. I don't have any suggestions other than those already made by others.

@llvm-beanz
Copy link
Collaborator

If it's only the case, sure, as all samplers/texture types are already camel-case.

Ah yes, I just meant CamelCase instead of underscore_separated to be consistent with other HLSL data types.

@greg-lunarg
Copy link
Contributor Author

I think we should also consider renaming the type to BufferPointer

So vk::BufferPointer? I am ok with that.

@greg-lunarg greg-lunarg changed the title Add vk::buffer_ptr to HLSL Add vk::BufferPointer to HLSL Apr 5, 2023
@greg-lunarg
Copy link
Contributor Author

As a general comment, please update the PR description with a meaningful description.

I have improved it. Let me know if you want more detail.

@llvm-beanz
Copy link
Collaborator

One area from the template that we really need to dig into and elaborate on here is:

How does it work interact other HLSL features (semantics, buffers, etc)?

Some specific questions to consider:

  • What would it mean to apply a semantic annotation to a BufferPointer?
  • What is the expected behavior if a BufferPointer is inside a structure loaded from a resource?
  • Can a BufferPointer be put in a Union?
  • Can a BufferPointer be put inside a raytracing payload or attribute structure?

@greg-lunarg
Copy link
Contributor Author

What would it mean to apply a semantic annotation to a BufferPointer?

Any annotations to a BufferPointer or a member contained directly or indirectly within a BufferPointer is disallowed.

@greg-lunarg
Copy link
Contributor Author

What is the expected behavior if a BufferPointer is inside a structure loaded from a resource?

The BufferPointer is loaded along with all other values of the structure and has the same value as it did within the resource.

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Apr 5, 2023

Can a BufferPointer be put in a Union?

Are there unions in HLSL? Not finding them. If there are, I think it should be ok. BufferPointers are defined to have size and alignment of a uint64_t. They are not opaque.

@greg-lunarg
Copy link
Contributor Author

Can a BufferPointer be put inside a raytracing payload or attribute structure?

At this point, I am going to say no. The spec does say where the type can be used (buffer members, local variables, function args and returns). All other uses are disallowed.

@greg-lunarg
Copy link
Contributor Author

I think what you're saying makes sense, but we need to get language into the spec that reflects the required alignment constraints.

The HLSL spec refers to components. So how about this?:

The target alignment A of vk::BufferPointer(T,A) must be at least as large as the largest alignment of all scalar component types contained in the BufferPointer's struct T and its structs.

@llvm-beanz
Copy link
Collaborator

Are there unions in HLSL? Not finding them. If there are, I think it should be ok. BufferPointers are defined to have size and alignment of a uint64_t. They are not opaque.

Unions were a planned feature in HLSL 2021 that didn't make it in time. They will almost certainly be part of HLSL 202x since much of the work is already done. See the proposal here.

The HLSL spec refers to components. So how about this?:

The target alignment A of vk::BufferPointer(T,A) must be at least as large as the largest alignment of all scalar component types contained in the BufferPointer's struct T and its structs.

Since the intent here isn't to introduce new alignment requirements for data types maybe we can keep this simpler:

The address referenced by a vk::BufferPointer<T> must be correctly aligned for T following HLSL buffer storage alignment constraints.

At this point, I am going to say no. The spec does say where the type can be used (buffer members, local variables, function args and returns). All other uses are disallowed.

I think this needs to be explored more. If they can be passed around through functions and stored in local variables... why not? (I don't have an actual answer or opinion here, I'm legitimately asking for a reasoning one way or the other).

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Apr 6, 2023

Since the intent here isn't to introduce new alignment requirements for data types maybe we can keep this simpler:

The address referenced by a vk::BufferPointer must be correctly aligned for T following HLSL buffer storage alignment constraints.

How about?:

The alignment A of vk::BufferPointer<T,A> must be consistent with all component elements within struct T.

@greg-lunarg
Copy link
Contributor Author

Can a BufferPointer be put inside a raytracing payload or attribute structure?

At this point, I am going to say no. The spec does say where the type can be used (buffer members, local variables, function args and returns). All other uses are disallowed.

I think this needs to be explored more. If they can be passed around through functions and stored in local variables... why not? (I don't have an actual answer or opinion here, I'm legitimately asking for a reasoning one way or the other).

As for payloads, I can't think of a reason for disallowing it, although I am not sure how useful it would be as payloads tend to be small.

It seems intersection attribute structures would be similar. For fixed function triangle intersection, it looks like those have a predetermined type (a struct containing a float2) so cannot use it. As for axis-aligned bounding box, it might be possible, but I again I am not sure how useful it would be since these are usually no more than 32 bytes in size.

@natevm
Copy link

natevm commented Apr 7, 2023

Can a BufferPointer be put inside a raytracing payload or attribute structure?

At this point, I am going to say no. The spec does say where the type can be used (buffer members, local variables, function args and returns). All other uses are disallowed.

I think this needs to be explored more. If they can be passed around through functions and stored in local variables... why not? (I don't have an actual answer or opinion here, I'm legitimately asking for a reasoning one way or the other).

As for payloads, I can't think of a reason for disallowing it, although I am not sure how useful it would be as payloads tend to be small.

It seems intersection attribute structures would be similar. For fixed function triangle intersection, it looks like those have a predetermined type (a struct containing a float2) so cannot use it. As for axis-aligned bounding box, it might be possible, but I again I am not sure how useful it would be since these are usually no more than 32 bytes in size.

IMO it is critical that we allow these buffer pointer types to be embedded in shader binding table records at the very least, since this allows OptiX codes to migrate to HLSL. With Intel ARC and NV also sorting these SBT records, we should be treating the SBT as the “correct” way to access buffer pointers in hit programs.

Arguably we should allow buffer pointers be passed in ray payload structures as well. We depend on this for many advanced ray tracing workloads, and are currently using the RawBufferLoad/Store with 64 bit addresses to do this.

In the past, we have passed pointers to stacks this way. It’s a common mechanism to extend the size of the ray payload. An intersection program might increment a count and store an intersection into a buffer passed in the ray payload. Useful for anyhit based particle compositing and range queries. Compromise is writing to VRAM via the buffer pointer, but for correctness this is sometimes required.

I can’t think of a reason to pass one in a hit attribute…

Also, its 32 4-byte registers, not 32 bytes. (I used to work at NVIDIA, at least on Ampere this was the limit. OptiX docs can confirm this.) So, I’d also argue ray payload attributes are more powerful than they’re given credit.

@natevm
Copy link

natevm commented Apr 7, 2023

Is it possible to make a vk::BufferPointer<vk::BufferPointer<T,A>,A> here? We depend on this with the current RawBufferLoad/Store API for a couple applications. (Also, why must T be a struct? Why can’t I have a buffer of built-in type, like a buffer of float3’s?)

For example, a grid of points, where each grid cell contains a different number of points, one might want a multidimensional buffer pointer, where the first dimension is a buffer of cells in the grid, and the second dimension is a buffer of points in the cell. (Often used for fluid simulations / SPH simulations)

Another example would be direct sampling of emissive triangles from one of many emissive meshes. First dimension would be the mesh, and second would be the triangle in that mesh.

Note, I’m asking about pointers to pointers here. Flattening multidimensional lists isn’t always possible, since you might want to have one buffer referenced in multiple locations (eg the SBT hit record for one geometry for surface normal generation, and in another separate buffer of buffers in a ray generation SBT record for triangle sampling)

@greg-lunarg
Copy link
Contributor Author

@llvm-beanz @natevm I will remove language which says use of this type is disallowed beyond local variables and function args and returns. Still, I would rather not try enumerating ALL places they are allowed. Can I rather say?: BufferPointer is supported wherever the underlying runtime and API supports them. This includes local variables and function args and returns.

@greg-lunarg
Copy link
Contributor Author

Is it possible to make a vk::BufferPointer<vk::BufferPointer<T,A>,A> here?

@natevm Does it work for you to have a vk::BufferPointer<T,A> where T is a struct containing a vk::BufferPointer? If so, I think this approach would solve all of your related requests above. Remember that a BufferPointer is a pointer to a buffer, and a buffer's underlying type is a struct. The struct is needed to support the layout that is needed. That is the reason for the restriction. But it should still allow you to do everything you want to do, I think.

@llvm-beanz
Copy link
Collaborator

Still, I would rather not try enumerating ALL places they are allowed. Can I rather say?: BufferPointer is supported wherever the underlying runtime and API supports them. This includes local variables and function args and returns.

I don't think this is sufficient. When we move from specifying to implementing this change we need to implement the compiler to enforce the appropriate restrictions on where BufferPointer or data types containing BufferPointer members can be used, and we need to implement tests to verify that behavior. Leaving that as a gap in the proposal will only make it harder for us to verify a correct implementation.

@natevm
Copy link

natevm commented Apr 7, 2023

Is it possible to make a vk::BufferPointer<vk::BufferPointer<T,A>,A> here?

@natevm Does it work for you to have a vk::BufferPointer<T,A> where T is a struct containing a vk::BufferPointer? If so, I think this approach would solve all of your related requests above. Remember that a BufferPointer is a pointer to a buffer, and a buffer's underlying type is a struct. The struct is needed to support the layout that is needed. That is the reason for the restriction. But it should still allow you to do everything you want to do, I think.

That would work. Mostly just awkward for cases like buffers of vertices, since float3 is not a struct. At the same time, I get the concern over struct alignment, so that seems like a good compromise IMO.

@natevm
Copy link

natevm commented Apr 7, 2023

Still, I would rather not try enumerating ALL places they are allowed. Can I rather say?: BufferPointer is supported wherever the underlying runtime and API supports them. This includes local variables and function args and returns.

I don't think this is sufficient. When we move from specifying to implementing this change we need to implement the compiler to enforce the appropriate restrictions on where BufferPointer or data types containing BufferPointer members can be used, and we need to implement tests to verify that behavior. Leaving that as a gap in the proposal will only make it harder for us to verify a correct implementation.

Perhaps it would be better to identify cases where vk::BufferPointer wouldn’t be allowed? Are there any cases in particular that come to mind?

@greg-lunarg
Copy link
Contributor Author

@natevm You mentioned raytracing payloads as a useful place for BufferPointer to be used. Are there other raytracing structures they would be useful in? Please list.

@greg-lunarg
Copy link
Contributor Author

greg-lunarg commented Apr 7, 2023

I don't think this is sufficient. When we move from specifying to implementing this change we need to implement the compiler to enforce the appropriate restrictions on where BufferPointer or data types containing BufferPointer members can be used, and we need to implement tests to verify that behavior. Leaving that as a gap in the proposal will only make it harder for us to verify a correct implementation.

@llvm-beanz @natevm I think we want to disallow BufferPointer type from input and output variables. And we want to allow static. Any disagreement?

@natevm
Copy link

natevm commented Apr 7, 2023

@natevm You mentioned raytracing payloads as a useful place for BufferPointer to be used. Are there other raytracing structures they would be useful in? Please list.

Shader binding table records are the other big one. Beyond that, none that I can think of. 🙂

@greg-lunarg
Copy link
Contributor Author

I think I have now made all changes requested or approved by @llvm-beanz.

One request not yet added is for a method to convert uint64_t to vk::BufferPointer. This was removed at the request of @s-perron. There is explanatory text saying that this was done to increase safety and reduce the chances of creating an invalid pointer.

@natevm
Copy link

natevm commented Apr 13, 2023

I think I have now made all changes requested or approved by @llvm-beanz.

One request not yet added is for a method to convert uint64_t to vk::BufferPointer. This was removed at the request of @s-perron. There is explanatory text saying that this was done to increase safety and reduce the chances of creating an invalid pointer.

This is a bad take, and we should add back in initialization. Users can just as easily make invalid pointers by uploading incorrect data from the host to the device. Why should initializing a pointer on the device be any different? This stance on safety is inconsistent, and in my opinion impossible to truly achieve without a unified language across host and device.

Instead, we should be clearly warn that initialization is unsafe if the uint64_t is not a true device address.

I don’t want HLSL unnecessarily constraining the code I write. Especially when it’s high performance GPU code… By preventing pointer initialization from a uint64_t, we force HLSL users to synchronize the GPU back to the CPU anytime a pointer needs initialization from a 64 bit device address.

Constructing from a 64 bit address also gives a clear mechanism for type casting (unless there’s another way to do this I’m unaware of?). And type casting is an important property of pointers in many high performance GPU algorithms, especially with raw and complex data structures.

Ultimately I think this argument that it is about safety is inconsistent. Pointers will be unsafe, regardless of if they can be initialized by a 64 bit device address on the device or not. They should be flexible and allow me to write the code I need to write to achieve high performance applications. Without initialization from a true uint64_t address, we place counterproductive constraints on developers without truly solving safety concerns.

Create a new first class type vk::BufferPointer which allows users to
efficiently reference buffer device addresses and allows tools to
analyze and report on usage in a logical rather than physical way i.e.
names rather than numeric offsets.
@s-perron
Copy link
Collaborator

This is a bad take, and we should add back in initialization. Users can just as easily make invalid pointers by uploading incorrect data from the host to the device. Why should initializing a pointer on the device be any different? This stance on safety is inconsistent, and in my opinion impossible to truly achieve without a unified language across host and device.

Could we have this discussion in a follow up issue? There is more to this than just having invalid addresses. There are also aliasing rules for these pointers that we have not talked about at all. The sample spir-v does not have any "AliasedPointer" decorations, so the assumption might be that all buffer pointers have to be restrict pointers. If that is the case, it should be stated explicitly. We need to make sure we are clear on the aliasing rules. Tracking aliasing in a compiler is much easier without arbitrary pointer arithmetic. I think all of these things need to be considered together, and this PR is already too busy that the conversation will be lost.

Instead, we should be clearly warn that initialization is unsafe if the uint64_t is not a true device address.

We need something more restrictive than that. To have a valid spir-v module, we would have to add a requirement in the spec that address must be to memory on the appropriate storage class. We need to guarentee that it will not get the same address as a "logical address" in spir-v. This will break the spir-v aliasing rules, and will cause subtle bugs because of compiler optimizations.

Ultimately I think this argument that it is about safety is inconsistent. Pointers will be unsafe, regardless of if they can be initialized by a 64 bit device address on the device or not.

You might be setting up a false dichotomy. You seem to be arguing that if we cannot make it safe in all cases, then there is no point in trying to reduce errors anywhere. One difference could be debugability. I'm guessing it would be much easier to debug the cpu than the gpu just based on the tooling that is available, so helping reduce bug in the shader could still be a productivity boost. We also have to consider how the compilers will optimize the code, and still keep it correct. When I use to work on C/C++ compilers, I spent too much time finding "compiler bugs" that were caused by the application violating the c++ ansi aliasing rules. This hurts everybody. I think we need to try to have a clear set of aliasing rules, and make it hard for people to break them, if not impossible. This will require more discussion.

@llvm-beanz
Copy link
Collaborator

@greg-lunarg, in the future can you please not force-push updates? We can always squash the commits on merge, but force pushing during the review makes it more difficult to see what changed between versions.

@greg-lunarg
Copy link
Contributor Author

@natevm Once this PR is merged, can you please open a new issue against this document showing a use case where conversion from uint64_t is required to get reasonable performance. I am presuming you are somehow manufacturing new uint64_t addresses in the shader and sharing them. True?

@s-perron I will open up a new issue regarding the aliasing properties of vk::BufferPointer. TL;DR All different instances of type vk::BufferPointer are assumed to not alias each other. Any desired synchronization of writes and reads through different buffer pointer instances must be executed explicitly.

@natevm
Copy link

natevm commented Apr 13, 2023

@greg-lunarg I unfortunately suspect if we do not discuss device-side vk::BufferPointer initialization now, then it will never be prioritized, even if we make another issue. I also do not have the resources to really argue for this. Don’t languages like CUDA expose pointer aliasing through the “volatile” keyword?

@s-perron a bit of a false dichotomy, but you also make a bit of a false dichotomy. You are assuming preventing device initialization is the only way to improve safety, but there are likely additional, likely better options. For example, we might allow vk::BufferPointer initialization only if a special compiler flag is specified, which would dissuade its use by default but also allow the flexibility in special circumstances.

One other thought I had was that constructing vk::BufferPointers on the device might provide additional options for safety. DXC cannot predict host side behavior, but potentially down the road, pointer initialization could be error checked by emitting additional instructions in debug builds.

My big concern actually is about casting. We need to be able to cast pointers from one type to another. How is this expected to work with the current vk:BufferPointer?

@llvm-beanz
Copy link
Collaborator

@greg-lunarg I unfortunately suspect if we do not discuss device-side vk::BufferPointer initialization now, then it will never be prioritized, even if we make another issue. I also do not have the resources to really argue for this. Don’t languages like CUDA expose pointer aliasing through the “volatile” keyword?

This gets really quickly outside the scope of what we can handle in the HLSL language. SPIR-V makes assumptions that pointers don't alias, we can't change SPIR-V, that's the Khronos Group's space.

For example, we might allow vk::BufferPointer initialization only if a special compiler flag is specified, which would dissuade its use by default but also allow the flexibility in special circumstances.

I'm going to put a strong dislike on this idea... Compiler flags that change what is or isn't legal the language are bad, and we should strive to not introduce them (and remove the few we already have for legacy reasons).

@natevm
Copy link

natevm commented Apr 13, 2023

Fair. Still, I think folks are going to expect to be able to cast a pointer from one type to another... That is a very basic feature.

At the moment, all I can think of is to create a vk::BufferPointer of every type that I can imagine, then initialize all those pointers to the same real address on the host. That's pretty gross.

Maybe some template mechanism?

template <typename T>
vk::BufferPointer<T> buffer_p;

buffer_p.get<float>() = 1.f;
buffer_p.get<int>() = 1;

Arg... the lack of pointer arithmetic really does make this things not pointers. They are clearly references... I agree with @devshgraphicsprogramming here

* Author(s): [Greg Fischer](https://github.com/greg-lunarg)
* Sponsor(s): [Chris Bieneman](https://github.com/llvm-beanz), [Steven Perron](https://github.com/s-perron), [Diego Novillo](https://github.com/dnovillo)
* Status: **Under Consideration**
* Planned Version: Retroactive addition to Vulkan X.X (requires SPIR-V X.X. Some language details require HLSL 202x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to know specifically which Vulkan/SPIR-V versions are required for this feature.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device addresses are officially supported in core Vulkan 1.2. But some earlier versions support the extension as well, depending on the vendor.

Not sure about SPIRV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vulkan 1.2 and SPIR-V 1.3. Done.

This new type will have the following operations

* Copy assignment and copy construction - These copy the value of the pointer from one variable to another.
* Dereference Method - The get() method represents the struct rvalue pointed at by the pointer to which the get() is applied. Note that this does not necessarily imply the entire value is physically loaded at this point; it merely represents the rvalue that would be loaded, similar to the effect of the * operator when applied to a pointer in C++. As selection . operators are applied to the get(), the data that is physically loaded is reduced appropriately.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Get() capital G to be consistent with HLSL style conventions.

I think it is probably more accurate that this is a const lvalue reference than an rvalue since I assume that's how this should likely be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Differences from C++ Pointers

vk::BufferPointer is different from a C++ pointer in that a selection “.” operation can and must be applied to a buffer pointer to de-reference it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Get() to dereference.


### Buffer Pointer Target Alignment

The target alignment `A` of `vk::BufferPointer(T,A)` must be at least as large as the largest component type in the buffer pointer's pointee struct type `T` or the compiler may issue an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: vk::BufferPointer<T,A>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Buffer Pointer Usage

vk::BufferPointer cannot be used in Input and Output variables. It also cannot be used in Unions, when those finally appear in HLSL.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: strike the word finally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@natevm
Copy link

natevm commented Apr 13, 2023

Ultimately, without pointer arithmetic or casting, I do not see a purpose for this proposal. In fact, it could be detrimental. These are the two key reasons why we are using vk::RawBufferLoad and Store. Without them, our library will cease to work, and we will be forced to either move away from HLSL, or to maintain a fork.

@llvm-beanz
Copy link
Collaborator

Ultimately, without pointer arithmetic or casting, I do not see a purpose for this proposal. In fact, it could be detrimental. These are the two key reasons why we are using vk::RawBufferLoad and Store. Without them, our library will cease to work, and we will be forced to either move away from HLSL, or to maintain a fork.

I think this is a bit hyperbolic. Your library isn't going to cease to work because this proposal doesn't remove any functionality that exists today. It may not be a fit for you to update to this feature, but that's not the same as pulling the rug out from under you and forcing you to "move away from HLSL, or to maintain a fork."

We hear your feedback. Casting is absolutely something we need to cover and describe in this proposal before we accept it as a feature for HLSL.

I think there are still some questions we need to sort out about arithmetic and aliasing.

This proposal is not yet done being designed. Even after we merge this document, it is under consideration not an accepted feature.

We'll iterate on the design until we get to something we're happy with, and our goal is to address meaningful needs by users, but it also almost certainly won't be everything that everyone wants.

Think of this PR as the first step in a process that has a few more steps to go. We're trying to move quickly to not roadblock and to get this into user hands as fast as we can while also ensuring a maintainable and high quality design.

@natevm
Copy link

natevm commented Apr 13, 2023

I think this is a bit hyperbolic. Your library isn't going to cease to work because this proposal doesn't remove any functionality that exists today. It may not be a fit for you to update to this feature, but that's not the same as pulling the rug out from under you and forcing you to "move away from HLSL, or to maintain a fork."

Because this impacts my job and research, I am being a bit hyperbolic yes. But it is important. We really do need these features. So long as we don't remove the vk::RawBufferLoad and vk::RawBufferStore commands right away, at least until casting and aliasing are figured out wit vk::BufferPointers, then I'm ok to proceed.

This proposal is not yet done being designed. Even after we merge this document, it is under consideration not an accepted feature.

Ah, I misunderstood the process here. That makes sense. Yes, could be discussed in another PR on the same proposal.

Copy link

@natevm natevm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one small thing we might add to this proposal is that vk::RawBufferStore also is a thing that has issues, which would also be addressed with a vk::BufferPointer.

* Author(s): [Greg Fischer](https://github.com/greg-lunarg)
* Sponsor(s): [Chris Bieneman](https://github.com/llvm-beanz), [Steven Perron](https://github.com/s-perron), [Diego Novillo](https://github.com/dnovillo)
* Status: **Under Consideration**
* Planned Version: Retroactive addition to Vulkan X.X (requires SPIR-V X.X. Some language details require HLSL 202x
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device addresses are officially supported in core Vulkan 1.2. But some earlier versions support the extension as well, depending on the vendor.

Not sure about SPIRV


## Motivation

vk::RawBufferLoad() is currently used to address physical storage buffer space. Unfortunately, use of this function has a number of shortcomings. One is that it generates low-level SPIR-V so that tools such as spirv-reflect, spirv-opt and renderdoc do not have the context to analyze and report on which members of a buffer are used in a logical manner. A bigger problem is that the HLSL programmer must compute the physical offsets of the members of a buffer which is error prone and difficult to maintain.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to also describe similar limitations with vk::RawBufferStore(). I'm not sure if this is a driver issue or something else, but could be related: microsoft/DirectXShaderCompiler#4620

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks!

* There is no explicit pointer arithmetic. All addressing is implicitly done using the `.` pointer, or indexing an array in the struct T.
* The comparison operators == and != are not supported for buffer pointers.

Most of these restrictions are there for safety. They minimize the possibility of getting an invalid pointer. If the get() method is used on a null pointer, the behaviour is undefined.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed earlier, these restrictions prevent key functionality of pointers exposed by vk::RawBufferLoad and Store. Specifically, casting and pointer arithmetic need to be exposed somehow.

But for the sake of progress, we can move this discussion to another issue.

@llvm-beanz
Copy link
Collaborator

So long as we don't remove the vk::RawBufferLoad and vk::RawBufferStore commands right away, at least until casting and aliasing are figured out wit vk::BufferPointers, then I'm ok to proceed.

There is no proposal to remove vk::RawBufferLoad and vk::RawBufferStore. That is not in this proposal. Nobody is suggesting removing them. This proposal doesn't come close to covering all the uses for the existing features.

Ah, I misunderstood the process here. That makes sense. Yes, could be discussed in another PR on the same proposal.

Our process docs aren't perfect, but they do cover the phases of the proposal process. This proposal is under consideration so there's still a bit to go before we start accepting changes to support it into DXC.

@greg-lunarg
Copy link
Contributor Author

I have pushed the latest round of review fixes.

Copy link
Collaborator

@s-perron s-perron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve with a comment. There are still two open issue that need to be discussed in follow up issues.

  1. How will aliasing work?
  2. Will we allow some type of type punning, and, if so, how will we do it?

@s-perron
Copy link
Collaborator

s-perron commented Apr 14, 2023

SPIR-V makes assumptions that pointers don't alias, we can't change SPIR-V, that's the Khronos Group's space.

One of the issues we will need to discuss further is whether or now we need to expose SPIR-V's alias and restrict decorations. The pointers can alias, but they have to be labeled as aliasing.

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#AliasingSection:~:text=For%20variables%20holding,alias%20each%20other.

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've filed some follow-up issues that reference this PR. This search should collect them all.

I think this covers all the issues mentioned by @s-perron as well, but we will file more as needed.

For now, I think we can accept this feature as under consideration while we work through the remaining design issues.

Unless anyone objects I'll merge the PR before EOD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants