[Experimental] Scaled Dot Product Attention FlashAttention Algorithm Conversion #147

yifeizh2 · 2024-07-03T07:17:04Z

Utilize flash attention algorithm to optimize the performance of attention

lmontigny · 2024-08-13T13:29:52Z

support flash attention + python API

yifeizh2 added this to the Functional llama2 milestone Jul 3, 2024

yifeizh2 linked a pull request Jul 3, 2024 that will close this issue

Transform: support sdpa to flash attention kernel conversion #131

Draft

3 tasks

yifeizh2 linked a pull request Jul 8, 2024 that will close this issue

Transform: support sdpa to flash attention kernel conversion #131

Draft

3 tasks

yifeizh2 modified the milestones: Functional llama2, CPU Jul 9, 2024

lmontigny assigned yifeizh2 Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experimental] Scaled Dot Product Attention FlashAttention Algorithm Conversion #147

[Experimental] Scaled Dot Product Attention FlashAttention Algorithm Conversion #147

yifeizh2 commented Jul 3, 2024

lmontigny commented Aug 13, 2024

[Experimental] Scaled Dot Product Attention FlashAttention Algorithm Conversion #147

[Experimental] Scaled Dot Product Attention FlashAttention Algorithm Conversion #147

Comments

yifeizh2 commented Jul 3, 2024

lmontigny commented Aug 13, 2024