Issue with Flash Attention on V100 GPU for Llama-3-VILA1.5-8B Model #109

vedernikovphoto · 2024-08-02T06:56:47Z

Hi,

I am encountering an issue when running inference on the Llama-3-VILA1.5-8B model. The error message I receive is:

RuntimeError: FlashAttention only supports Ampere GPUs or newer.

I am using a V100 GPU, which is not an Ampere GPU. Could you please provide guidance on how to disable Flash Attention for this model?

Thank you!

The text was updated successfully, but these errors were encountered:

NuyoaHygge · 2024-08-06T07:52:41Z

lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py Lines 608-609, swap the annotation.

vedernikovphoto · 2024-08-07T11:02:46Z

lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py Lines 608-609, swap the annotation.

Thanks, it worked for my GPU now! However, the output is really weird; it outputs a meaningless string of empty spaces and commas. I faced the same issue with another Vision Language Model, while some other Vision Language Models work well. I believe this might be due to the transformers library version. Anyway, I also tried running VILA on the CPU, and in that case, it worked fine.

NuyoaHygge · 2024-08-07T11:59:19Z

lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py Lines 608-609, swap the annotation.

Thanks, it worked for my GPU now! However, the output is really weird; it outputs a meaningless string of empty spaces and commas. I faced the same issue with another Vision Language Model, while some other Vision Language Models work well. I believe this might be due to the transformers library version. Anyway, I also tried running VILA on the CPU, and in that case, it worked fine.

Me, too. I've had similar issues with redundant commas and spaces. However, when I use the VILA1.5-3B model to input a video along with some questions, it actually performs better than the 8B model. Sometimes it generates coherent responses, but other times it only replies with one to three words.

LanceLeonhart · 2024-08-26T13:09:11Z

lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py Lines 608-609, swap the annotation.

Thanks, it worked for my GPU now! However, the output is really weird; it outputs a meaningless string of empty spaces and commas. I faced the same issue with another Vision Language Model, while some other Vision Language Models work well. I believe this might be due to the transformers library version. Anyway, I also tried running VILA on the CPU, and in that case, it worked fine.

Hi, I also ran into this problem and got weird empty outputs. Could you please share how to solve this problem if you find a way out?

liuyijiang1994 · 2024-09-03T07:22:40Z

Hello, could you please tell me the specific line? In the current version of the code, line 608 corresponds to this part, which I guess is not the lines everyone found.

liuyijiang1994 · 2024-09-03T07:40:25Z

Hello, could you please tell me the specific line? In the current version of the code, line 608 corresponds to this part, which I guess is not the lines everyone found.

oh i see, lines 616-617

liuyijiang1994 · 2024-09-03T08:56:23Z

Me, too. I've had similar issues with redundant commas and spaces. However, when I use the VILA1.5-3B model to input a video along with some questions, it actually performs better than the 8B model. Sometimes it generates coherent responses, but other times it only replies with one to three words.

same problem, redundant commas and spaces.

NuyoaHygge · 2024-09-05T07:32:54Z

我放弃，改用A卡了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Flash Attention on V100 GPU for Llama-3-VILA1.5-8B Model #109

Issue with Flash Attention on V100 GPU for Llama-3-VILA1.5-8B Model #109

vedernikovphoto commented Aug 2, 2024

NuyoaHygge commented Aug 6, 2024

vedernikovphoto commented Aug 7, 2024

NuyoaHygge commented Aug 7, 2024

LanceLeonhart commented Aug 26, 2024

liuyijiang1994 commented Sep 3, 2024

liuyijiang1994 commented Sep 3, 2024

liuyijiang1994 commented Sep 3, 2024

NuyoaHygge commented Sep 5, 2024

Issue with Flash Attention on V100 GPU for Llama-3-VILA1.5-8B Model #109

Issue with Flash Attention on V100 GPU for Llama-3-VILA1.5-8B Model #109

Comments

vedernikovphoto commented Aug 2, 2024

NuyoaHygge commented Aug 6, 2024

vedernikovphoto commented Aug 7, 2024

NuyoaHygge commented Aug 7, 2024

LanceLeonhart commented Aug 26, 2024

liuyijiang1994 commented Sep 3, 2024

liuyijiang1994 commented Sep 3, 2024

liuyijiang1994 commented Sep 3, 2024

NuyoaHygge commented Sep 5, 2024