Wave Dec 2024 Release #250

harsh-nod · 2024-11-04T19:02:33Z

This issue lists all feature requests and improvements slated for the Nov 2024 Tkw release.

Flash Attention Performance is highest priority

================================================

Week 1 (Nov 8th)

Week 2(Nov 15)
Ivan

Adding support for using tensors from the kernel in mapping for reads and writes
Harsh
Create a FA page table dataset for Ivan to test his PR on
Create a harness for SGLANG grok / llama where we can test baseline perf and add our kernels and see perf (with Sai)
Write a decode attention kernel
Unaligned sequence length & Unaligned head dim
Stan
Adjusting k-width to maximize reads from shared memory and align layouts between 2 matmuls
Scheduling meeting with Giuseppe to show kernel and help him iterate
15th meeting with quantization team showing the FP8 kernel
=========================================================================================
Unassigned
Getting kernels with hipblaslt where we can turn knobs and relate knobs to output kernels
Packed Shuffles
Dynamic & aligned attention fp16 (M & K2 not specified)

Week 3(Nov 22)

Identifying which knobs represent multi-buffering and investigating strategy for multi-buffering

Week 4(Nov 29)

harsh-nod changed the title ~~Dec 2024 Release~~ Wave Dec 2024 Release Nov 4, 2024

Provide feedback