[MIEB] Add new multimodal retrieval tasks #1523

izhx · 2024-11-28T11:59:29Z

Hi, thanks for the cool MTEB toolkit.

We are currently preparing to release an embedding model for universal multimodal retrieval, along with our compiled evaluations. I noticed that you are also developing image extensions for MTEB. So I would like to inquire if you would be interested in incorporating our testing code into MTEB, perhaps as part of MIEB retrieval.

Our test is primarily divided into four parts: MTEB text retrieval, M-BEIR, ViDoRe, and a few additional it2it retrieval data. I guess many of them has already been incorporated into mteb.

Below is a preliminary model testing results table.

If you're interested, where could I find the docs to start with? Thanks a lot.

isaac-chung · 2024-11-28T12:38:03Z

Hey @izhx! Thanks for reaching out. Tagging @gowitheflow-1998 here as well.

We're working on integrating MIEB docs with MTEB at the moment. I think the general steps are:

Implement any missing retrieval tasks as Any2AnyRetrieval task
Implement any missing models
Optionally, implement a benchmark as a collection of tasks
These would be PRs to the mieb branch

KennethEnevoldsen · 2024-11-29T09:17:36Z

Thanks for reaching out @izhx - can you send the reference paper (can't seem to find the paper with that specific table)

izhx · 2024-11-29T10:01:40Z

Thanks for reaching out @izhx - can you send the reference paper (can't seem to find the table with that specific table)

Hi, we will submit the paper to arxiv and open-source the models in about 10 days and are still finalizing the results. @KennethEnevoldsen

In addition, I checked the Any2AnyRetrieval tasks and find that there are only 4 datasets not included.
I will add their implementations and organize the test of our model by mieb.

gowitheflow-1998 · 2024-11-29T10:08:00Z

Thanks for reaching out! Adding to @isaac-chung's comment, we welcome PRs both to improve the Any2AnyRetrieval Evaluator and add your specific tasks! We'll be happy to benchmark your model on all MIEB tasks on our end as well if you can PR your model implementation to here. An old doc for the full process can be found here.

izhx · 2024-12-11T12:46:38Z

Hi, It appears that in Any2AnyDenseRetrievalExactSearch, we currently only use get_text_embeddings, get_image_embeddings, and get_fused_embeddings to encode both query and corpus. These functions don't differentiate between query and corpus calls.

However, the previous DenseRetrievalExactSearch for text used encode_corpus to obtain corpus embeddings, which is beneficial for models that require distinct instructions for query and corpus processing, such as GTE and our new multimodal embedding model GME.

Therefore, I'm wondering if we should add an is_query parameter to the get_xxx_embeddings functions, defaulting to True, to allow for this distinction.

This is just an example, and also my current implementation. I look forward to everyone's discussion and suggestions for better solutions.

@isaac-chung @gowitheflow-1998

gowitheflow-1998 · 2024-12-11T13:19:08Z

of course, adding the ability to take in instructions (e.g, model-specific prompts triggered by is_query) has always been the plan since the start of MIEB. Although this ability is not optimized for a lot of the image-text models, especially ones that can't naturally do interleaved encodings (e.g, CLIP-based), I personally think this will be de facto for future models.

At the moment, a few state-of-ther-art models have their own optimized formats. e.g., input_type for voyage's multimodal 3; the optimal prompts that E5-V were trained on and thus needed in inference, etc, which we currently support in a model-specific way. As these currently are mostly models-dependent 1) some models differentiate between queries and documents, like voyage's and yours as you mentioned. 2) some need specific templates while staying the same across queries and documents.

In general, I think it makes sense to add is_query if it doesn't affect other multi-modal models that don't benefit from it. Feel free to PR the solution if you have anything in mind! @izhx

Samoed · 2024-12-11T15:07:49Z

FYI, in the main branch, there is a PromptType enum passed to the encode function to specify whether it's a query or a passage. However, I'm not sure how this is implemented in the mieb branch.

Example

izhx · 2024-12-12T02:07:02Z

Thanks for the suggestions!
I think it might be more reasonable to follow the design in main branch, prompt_type: PromptType | None = None,

class PromptType(str, Enum):
    query = "query"
    passage = "passage"

isaac-chung added the mieb The image extension of MTEB label Nov 29, 2024

izhx changed the title ~~About MIEB, adding new multimodal retrieval tasks~~ [MIEB] Add new multimodal retrieval tasks Dec 4, 2024

This was referenced Dec 12, 2024

[MIEB] Make multimodal models compatible to task_name and prompt_type #1583

Merged

[MIEB] Add new multimodal retrieval tasks #1611

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIEB] Add new multimodal retrieval tasks #1523

[MIEB] Add new multimodal retrieval tasks #1523

izhx commented Nov 28, 2024

isaac-chung commented Nov 28, 2024

KennethEnevoldsen commented Nov 29, 2024 •

edited

Loading

izhx commented Nov 29, 2024

gowitheflow-1998 commented Nov 29, 2024 •

edited

Loading

izhx commented Dec 11, 2024

gowitheflow-1998 commented Dec 11, 2024

Samoed commented Dec 11, 2024

izhx commented Dec 12, 2024

[MIEB] Add new multimodal retrieval tasks #1523

[MIEB] Add new multimodal retrieval tasks #1523

Comments

izhx commented Nov 28, 2024

isaac-chung commented Nov 28, 2024

KennethEnevoldsen commented Nov 29, 2024 • edited Loading

izhx commented Nov 29, 2024

gowitheflow-1998 commented Nov 29, 2024 • edited Loading

izhx commented Dec 11, 2024

gowitheflow-1998 commented Dec 11, 2024

Samoed commented Dec 11, 2024

izhx commented Dec 12, 2024

KennethEnevoldsen commented Nov 29, 2024 •

edited

Loading

gowitheflow-1998 commented Nov 29, 2024 •

edited

Loading