OpenLLM on CPU #800

QLutz · 2023-12-19T09:28:57Z

QLutz
Dec 19, 2023

Hello 👋

Where does your roadmap stand on CPU inference ?
Since vLLM does not actively support running models on CPU (at least not yet), it seems moving to vLLM would mean OpenLLM is very much meant to run on GPU.

aarnphm · 2023-12-27T05:41:31Z

aarnphm
Dec 27, 2023
Maintainer

Yes, for now our focus for our customers are still running llm fast, which means most of the time the serverless deployments are using GPU.

We have roadmap for CPU, as on-prem deployment are also P1 for us. We will consider some alternatives such as gguf, or llamaInfer as seen of recently.

Though, I'm pretty much interested in the development of mlx and currently playing around with it this break.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenLLM on CPU #800

{{title}}

Replies: 1 comment

{{title}}

Select a reply

OpenLLM on CPU #800

QLutz Dec 19, 2023

Replies: 1 comment

aarnphm Dec 27, 2023 Maintainer

QLutz
Dec 19, 2023

aarnphm
Dec 27, 2023
Maintainer