Which open models can work with MultimodalWebSurfer? #4777

hsm207 · 2024-12-20T21:58:36Z

hsm207
Dec 20, 2024

I was using Azure Open AI with MultimodalWebSurfer and got rate limit errors so I decided to try some local models with Ollama to bypass the error. Unfortunately, all of them throw the following error:

does not support tools

For example:

with llama3.2-vision:11b, the stack trace is:

.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1633, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': 'registry.ollama.ai/library/llama3.2-vision:11b does not support tools', 'type': 'api_error',

I've tried it with the following models and all gave the same error:

llama3.2-vision:11b
llava-phi3:3.8b
llava-llama3:8b
minicpm-v:latest

This is the code I tried:

from autogen_ext.models.openai import  OpenAIChatCompletionClient

local_multimodal_client = OpenAIChatCompletionClient(
        model="llama3.2-vision:11b",
        api_key="NOT REQUIRED FOR LOCAL MODELS",
        base_url="http://host.docker.internal:11434/v1",
        model_capabilities={
            "json_output": False,
            "vision": True,
            "function_calling": True,
        },
    )

from autogen_agentchat.ui import Console
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.agents.web_surfer import MultimodalWebSurfer


async def main() -> None:
    # Define an agent
    web_surfer_agent = MultimodalWebSurfer(
        name="MultimodalWebSurfer",
        model_client=local_multimodal_client,
    )

    # Define a team
    agent_team = RoundRobinGroupChat([web_surfer_agent], max_turns=3)

    # Run the team and stream messages to the console
    stream = agent_team.run_stream(task="Navigate to the AutoGen readme on GitHub.")
    await Console(stream)
    # Close the browser controlled by the agent
    await web_surfer_agent.close()


await main()

The models are pretty recent so I'm wondering, are there really no open models out there at the moment that can do both vision and tool calling?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which open models can work with MultimodalWebSurfer? #4777

{{title}}

Replies: 0 comments

Select a reply

Which open models can work with MultimodalWebSurfer? #4777

hsm207 Dec 20, 2024

Replies: 0 comments

hsm207
Dec 20, 2024