Allow aggregated tasks within benchmarks #1231

KennethEnevoldsen · 2024-09-23T10:45:00Z

We currently have only one aggregated task (CQGDupstack), however, we can def. imagien more in the future (e.g. for CoIR in embeddings-benchmark/leaderboard#27).

A proposed solution is to use the benchmark (they are already a group of tasks) and then allow a benchmark to be a list[task | benchmark]

This will require updated to the MTEB.MTEB, as well as the create_meta and potentially for CLI.k

This approach should also solve: #1171

The text was updated successfully, but these errors were encountered:

Samoed · 2024-09-24T11:30:05Z

I think that can be added average result for each subset for multilingual datasets

KennethEnevoldsen · 2024-09-24T18:37:01Z

Not entirely sure what is meant @Samoed - should we add it for multilingual datasets? (isn't that there?)

Samoed · 2024-09-24T18:51:40Z

Yes, the author of the COIR benchmark wanted an average score for the task. I believe this can be done if all subsets of the task are included in the results. This could also be implemented in the results repository. Currently, there are some tasks where the average is calculated.

KennethEnevoldsen · 2024-09-24T19:53:57Z

This seems like a quick fix (which I am more than happy to add for now), but it does not specify within benchmark specification within mteb how the scores should be aggregated.

KennethEnevoldsen mentioned this issue Sep 23, 2024

Update COIR default.jsonl embeddings-benchmark/leaderboard#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow aggregated tasks within benchmarks #1231

Allow aggregated tasks within benchmarks #1231

KennethEnevoldsen commented Sep 23, 2024

Samoed commented Sep 24, 2024

KennethEnevoldsen commented Sep 24, 2024 •

edited

Loading

Samoed commented Sep 24, 2024

KennethEnevoldsen commented Sep 24, 2024

Allow aggregated tasks within benchmarks #1231

Allow aggregated tasks within benchmarks #1231

Comments

KennethEnevoldsen commented Sep 23, 2024

Samoed commented Sep 24, 2024

KennethEnevoldsen commented Sep 24, 2024 • edited Loading

Samoed commented Sep 24, 2024

KennethEnevoldsen commented Sep 24, 2024

KennethEnevoldsen commented Sep 24, 2024 •

edited

Loading