-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Multimodal agents demo #320
base: master
Are you sure you want to change the base?
Conversation
The high-level class design looks good to me! Left one minor comment in the code. |
Hi, thanks for your review, yet I can't see your comment in the code. Could you post it again? |
TODO:
|
Just to leave a note here – i was planning to combine this with the huggingface agent, yet i encountered the SSLError, w/ or w/o firewall restrictions (cf., #268, #17611); you may check if you have the same issue @chenllliang. i plan alternatively make a minimal example with the interpreter. |
I have updated the documentation of multimdoal prompt class. (It could be merged I think) |
currently developming multimodal role-playing demo |
I design a pipeline for a possible application of multimodal agents' collaboartion. It's called "Scientific Graph Painter", which is used to generate python code to draw a figure from in scientific papers. It has 3 roles and possible models:
The pipeline graph is listed below, sry I am too occupied with other stuffs currently in Feb. 2024. Anyone feels interested in the topic can implement or discuss. I think something need to be done first: add image information in agents' message. I change the PR from "add multimodal prompt class" to "Multimodal agents demo". |
add MultiModalPrompt class and an example
Description
This PR introduces a new class,
MultiModalPrompt
, aimed at facilitating the transfer of information between multimodal agents. The class encapsulates both text prompts and additional multimodal data, thereby allowing seamless integration and interchangeability.The updated src file is
camel/prompts/multimodal.py
andcamel/prompts/__init__.py
.An example is added to
examples/multimodal/formating_example.py
Key Features:
TextPrompt
class) and multimodal information.MODALITIES
), and it can validate the provided modalities against this list.format
method allows the formatting of both text prompts and multimodal information in tandem. It can also distinguish between keyword arguments meant for the text prompt and those intended for multimodal information.to_model_format
method, the prompt can be converted into a model-understandable format. By default, it uses thedefault_to_model_format
method, but custom methods can also be provided.Code Changes:
MultiModalPrompt
class with methods for initializing, formatting, and converting to a model-understandable format.default_to_model_format
, which serves as the default method to format multimodal prompts for models.Example Description for Pull Request
MultiModalPrompt Example Demonstrations
In the attached example
examples/multimodal/formating_example.py
, it demonstrates the capabilities and practical use-cases of the newly addedMultiModalPrompt
class for various multimodal scenarios.Single Image VQA (Visual Question-Answering) Prompt:
Multi-Image Question with Custom Model Input Format:
multi_image_input_format
, is implemented which labels images in the prompt with numbers. This indexing format is inspired by MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning.<Image{i}>
are introduced in the textual prompt to indicate image positions.[Image{i}]
acts as the visual placeholder for the i-th image in the prompt.This example serves as a practical guide to:
MultiModalPrompt
can be seamlessly integrated with existing prompts.The described example not only showcases the ease of use and flexibility of the
MultiModalPrompt
class but also demonstrates its applicability across various real-world scenarios, emphasizing its potential utility for developers and researchers in the multimodal domain.Future Work
MultiModalPromptDict
.Please review the changes and provide feedback.
Motivation and Context
Why is this change required? What problem does it solve?
close #317
Feature RequestTypes of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!