Replies: 1 comment
-
Yes, for now our focus for our customers are still running llm fast, which means most of the time the serverless deployments are using GPU. We have roadmap for CPU, as on-prem deployment are also P1 for us. We will consider some alternatives such as gguf, or llamaInfer as seen of recently. Though, I'm pretty much interested in the development of mlx and currently playing around with it this break. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello 👋
Where does your roadmap stand on CPU inference ?
Since vLLM does not actively support running models on CPU (at least not yet), it seems moving to vLLM would mean OpenLLM is very much meant to run on GPU.
Beta Was this translation helpful? Give feedback.
All reactions