Vision model support in mistral.rs

Mistral.rs supports various modalities of models, including vision models. Vision models take images and text as input and have the capability to reason over both.

Please see docs for the following model types:

Phi 3 Vision: PHI3V.md
Idefics2: IDEFICS2.md
LLaVA and LLaVANext LLAVA.md
Llama 3.2 Vision VLLAMA.md

Note for the Python and HTTP APIs: We follow the OpenAI specification for structuring the image messages and allow both base64 encoded images as well as a URL/path to the image. There are many examples of this, see this Python example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VISION_MODELS.md

VISION_MODELS.md

Vision model support in mistral.rs

Files

VISION_MODELS.md

Latest commit

History

VISION_MODELS.md

File metadata and controls

Vision model support in mistral.rs