[Usage]: GGUFed models on AMD GPUs #632

TuzelKO · 2024-09-05T22:42:33Z

Hello! Having studied the documentation provided, I still could not understand whether there is support for GGUF quantized models on AMD GPU. I would like to use the Q8 or even Q4 model based on Mistral NeMo 12B in my project in order to slightly sacrifice quality for the sake of generation speed. We are planning to build a server with 4-6 Radeon 7900 XTX graphic cards.

AMD's solutions look more attractive than Nvidia's solutions in terms of performance/cost and performance/power consumption. Especially for small startups.

I would also like to know whether it is possible to run one small model (for example, Mistral NeMo 12B) in parallel on several graphic cards. This does not mean splitting the model into several cards, but running the same model with full placement in the VRAM on each card. Or will I need to run a separate container for each graphic card?

In our project we are considering using the Magnum v2 12B model (https://huggingface.co/anthracite-org/magnum-v2-12b-gguf). We are currently running it through llama.ccp, but it seems that it is not very well designed to handle parallel requests from multiple users.

AlpinDale · 2024-09-05T23:24:15Z

Hi. GGUF kernels should theoretically work on AMD, but it's untested as I don't have regular access to AMD compute.

Multi-gpu should work fine on AMD. Tensor parallelism will split the model tensors evenly across GPUs. You simply need to launch the model with --tensor-parallel-size X, where X is the number of GPUs. I don't really recommend GGUF for this, because it doesn't seem to scale well at the moment. For AMD, you may want to do either GPTQ or FP8 W8A8 (through llm-compressor).

TuzelKO changed the title ~~[New Model]: GGUFed models on AMD GPUs~~ [Usage]: GGUFed models on AMD GPUs Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: GGUFed models on AMD GPUs #632

[Usage]: GGUFed models on AMD GPUs #632

TuzelKO commented Sep 5, 2024 •

edited

Loading

AlpinDale commented Sep 5, 2024

[Usage]: GGUFed models on AMD GPUs #632

[Usage]: GGUFed models on AMD GPUs #632

Comments

TuzelKO commented Sep 5, 2024 • edited Loading

AlpinDale commented Sep 5, 2024

TuzelKO commented Sep 5, 2024 •

edited

Loading