Create RFC for XeTile and XeGPU Dialect #655

Jianhui-Li · 2023-06-20T05:50:52Z

This is the RFC for XeTile and XeGPU Dialect. XeTile dialect supports the tile-based programming model and decomposes the GEMM kernel to a large enough tile size at the subgroup level. The XeGPU dialect models Xe instructions like DPAS and 2D block load.

kurapov-peter

I think having two separate dialects is a good decision as it would enable more use cases and separate the optimizations from hardware features abstraction. Analytical tools might use the XeGPU dialect directly to perform appropriate lowering, so the independence is nice. The dialect should probably also include specific instructions like barriers.

Having a single entry point for any GPGPU workload through the XeGPU dialect seems a reasonable arch solution to me. We'd control the lowering to either VC intrinsics or SPRIV extensions directly and avoid duplicating this functionality in different tools. Ideally, there should be a path to lower this to LLVM and then use native SPIRV backend. That would require the SPIRV extensions to be a part of vanilla LLVM though.

As a side note, I'd also consider the runtime part of the code: kernel scheduling and launch, memory allocation and movement. I'm not sure if these can/should be a part of the very same XeGPU dialect but such primitives do arise in some scenarios when a complete graph for running a workload in a heterogeneous environment is built. In general, I'd prefer to have a simple dialect with intuitive and predictive lowering (to genx code) behavior though.

Jianhui-Li · 2023-07-26T17:34:58Z

As a side note, I'd also consider the runtime part of the code: kernel scheduling and launch, memory allocation and movement. I'm not sure if these can/should be a part of the very same XeGPU dialect but such primitives do arise in some scenarios when a complete graph for running a workload in a heterogeneous environment is built. In general, I'd prefer to have a simple dialect with intuitive and predictive lowering (to genx code) behavior though.

The runtime is defined inside GPUX dialect, https://github.com/intel/mlir-extensions/tree/refactor/include/imex/Dialect/GPUX, which serves as extension of GPU dialect. Does that fit your need? It support kernel launch, memory allocation, and etc.

kurapov-peter · 2023-07-27T11:01:06Z

As a side note, I'd also consider the runtime part of the code: kernel scheduling and launch, memory allocation and movement. I'm not sure if these can/should be a part of the very same XeGPU dialect but such primitives do arise in some scenarios when a complete graph for running a workload in a heterogeneous environment is built. In general, I'd prefer to have a simple dialect with intuitive and predictive lowering (to genx code) behavior though.

The runtime is defined inside GPUX dialect, https://github.com/intel/mlir-extensions/tree/refactor/include/imex/Dialect/GPUX, which serves as extension of GPU dialect. Does that fit your need? It support kernel launch, memory allocation, and etc.

Yes, it seems to close the gap, I saw that one after posting the comment :)

Refresh the document with the latest refinement on XeTile and XeGPU dialect.

improve the documentation layout

Improve documentation layout

improvement table

minor layout change for the table

charithaintc · 2023-11-21T20:45:02Z