You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was using Azure Open AI with MultimodalWebSurfer and got rate limit errors so I decided to try some local models with Ollama to bypass the error. Unfortunately, all of them throw the following error:
does not support tools
For example:
with llama3.2-vision:11b, the stack trace is:
.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1633, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': 'registry.ollama.ai/library/llama3.2-vision:11b does not support tools', 'type': 'api_error',
I've tried it with the following models and all gave the same error:
llama3.2-vision:11b
llava-phi3:3.8b
llava-llama3:8b
minicpm-v:latest
This is the code I tried:
fromautogen_ext.models.openaiimportOpenAIChatCompletionClientlocal_multimodal_client=OpenAIChatCompletionClient(
model="llama3.2-vision:11b",
api_key="NOT REQUIRED FOR LOCAL MODELS",
base_url="http://host.docker.internal:11434/v1",
model_capabilities={
"json_output": False,
"vision": True,
"function_calling": True,
},
)
fromautogen_agentchat.uiimportConsolefromautogen_agentchat.teamsimportRoundRobinGroupChatfromautogen_ext.models.openaiimportOpenAIChatCompletionClientfromautogen_ext.agents.web_surferimportMultimodalWebSurferasyncdefmain() ->None:
# Define an agentweb_surfer_agent=MultimodalWebSurfer(
name="MultimodalWebSurfer",
model_client=local_multimodal_client,
)
# Define a teamagent_team=RoundRobinGroupChat([web_surfer_agent], max_turns=3)
# Run the team and stream messages to the consolestream=agent_team.run_stream(task="Navigate to the AutoGen readme on GitHub.")
awaitConsole(stream)
# Close the browser controlled by the agentawaitweb_surfer_agent.close()
awaitmain()
The models are pretty recent so I'm wondering, are there really no open models out there at the moment that can do both vision and tool calling?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I was using Azure Open AI with MultimodalWebSurfer and got rate limit errors so I decided to try some local models with Ollama to bypass the error. Unfortunately, all of them throw the following error:
For example:
with
llama3.2-vision:11b
, the stack trace is:I've tried it with the following models and all gave the same error:
This is the code I tried:
The models are pretty recent so I'm wondering, are there really no open models out there at the moment that can do both vision and tool calling?
Beta Was this translation helpful? Give feedback.
All reactions