You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was sifting through the cuDNN documentation and came across these snippets:
"cuDNN BF16 and FP16 Fused Flash Attention now supports embedding dim = 256 use cases in forward propagation.
Expanded support of FP16 and BF16 Fused Flash Attention by adding the sliding window attention feature on NVIDIA Ampere and Hopper GPUs. For more information, refer to the cuDNN Developer Guide."
This is from the release notes for cuDNN 9.1.1 here:
At the time that ctranslate2 supported flash attention it relied on cuDNN 8.8.0...
FA was removed from the pyipi.org release due to considerations of (1) file size and (2) minimal benefit. Regarding the second issue, perhaps the cause was because at the time Ctranslate2 did not rely on cuDNN 9.1.1, which was the first version to support flash attention?
The text was updated successfully, but these errors were encountered:
I was sifting through the cuDNN documentation and came across these snippets:
"cuDNN BF16 and FP16 Fused Flash Attention now supports embedding dim = 256 use cases in forward propagation.
Expanded support of FP16 and BF16 Fused Flash Attention by adding the sliding window attention feature on NVIDIA Ampere and Hopper GPUs. For more information, refer to the cuDNN Developer Guide."
This is from the release notes for cuDNN 9.1.1 here:
https://docs.nvidia.com/deeplearning/cudnn/v9.1.1/release-notes.html#cudnn-9-1-1
At the time that
ctranslate2
supported flash attention it relied on cuDNN 8.8.0...FA was removed from the pyipi.org release due to considerations of (1) file size and (2) minimal benefit. Regarding the second issue, perhaps the cause was because at the time
Ctranslate2
did not rely on cuDNN 9.1.1, which was the first version to support flash attention?The text was updated successfully, but these errors were encountered: