Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend XeTile.tile to support scatter load (experimental) #811

Merged
merged 9 commits into from
Oct 7, 2024
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion docs/rfcs/XeTile.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ To create a 2D Tile memory descriptor, the user needs to set up a tile (init_til
memref<128x128xbf16, affine_map=<(d0, d1)->(d1, d0)> into tile<64x32xbf16, #tile_attr>
```


With the tile date type, XeTile supports load_tile, prefetch_tile, and store_tile.

`load_tile` loads a tile to a 2D vector, which could be backed by a register region.
Expand Down Expand Up @@ -208,6 +207,22 @@ With the data being presented as 4D vector, all the vector based XeTile operatio
```
The tile_pack and tile_unpack operation is similar to pack and unpack operation of tensor dialect. The source vector must be a 2D dimension vector, and no permutation is allowed for the result 4D vector, so effectively the blocking effect is identical to tensor pack/unpack operation with inner_dims_pos = [0,1] inner_dims_pos = [0, 1].

## support for load_gather and store_scatter (experimental)
`load_gather` (aka. load) loads data with each element's address being explictly specified. The tile is created with a base address and offset for each element to be load. The result tile has a `scatter` attribute to different it from the regular tile.
Copy link
Contributor

@Garra1980 Garra1980 Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different - > distinguish?

"to be load" - > "to be loaded"

```mlir
%tile0 = XeTile.init_tile %base_addr, %tile_offsets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XeTile->xetile here and below since it is a name for dialect operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

i64, vector<1x256xindex> into tile<1x256xbf16, #scatter>
```
`load_gather` (aka. load) loads data with prepared tile and mask. Attribute `padding` specifies the padding value for the out-of-boundary access. The default value is zero.
```mlir
%vector_a = XeTile.load_gather %tile_0, %mask, {padding = 1.0} :
tile<1x256xbf16, #scatter> into vector<1x256xbf16>
```
`store_scatter` stores a 2d vector to a 2D tile with `scatter` attribute.
```mlir
XeTile.store_scatter %vector_a, %mask, %tile_0 :
vector<1x256xbf16> into tile<1x256xbf16, #scatter>
```

## Workgroup Level XeTile extension (experimental)
`xetile.wg_map` mapping attribute allows XeTile operation to work at the workgroup level. XeTile operations work by default at the subgroup level without wg_map attribute. With wg_map attributes, XeTile operations can be applied to workgroup-level tile sizes. The attribute `xetile.wg_map` guides the lowering from the workgroup level to the subgroup level by specifying how the data is distributed across parallel subgroups. It gives the user full control over the lowering process so that the user can tune the block size for both the workgroup and subgroup for optimal performance.
Expand Down
Loading