Leaderboard 2.0: should we fetch data from model card? #1373

KennethEnevoldsen · 2024-11-01T16:18:43Z

Currently, model metadata files become quite extensive if they cover all tasks in mteb (#1368). This makes it frustrating for users to see what scores a model has (having scores for >30 datasets doesn't really give a great overview). The ideal solution would probably be to either:

Allow users to upload an mteb_results/* folder
Only accept results pushed to embedding-benchmark/results

Let me know what you guys think (@Muennighoff, @imenelydiaker, @isaac-chung, @orionw, @x-tabdeveloping)

The text was updated successfully, but these errors were encountered:

isaac-chung · 2024-11-01T16:41:41Z

It's not like they are blocked from doing so, right? One option was to use git-lfs. I see some pros and cons of this suggestion:

pros:
- we can validate the scores before accepting them, e.g. in terms of structure
- single source of truth
cons:
- additional step for users to submit to the leaderboard
- additional items for maintainers to review: though this can be alleviated by automating checks.

x-tabdeveloping · 2024-11-04T14:08:52Z

I'd definitely prefer keeping it in one place (embedding-benchmark/results). The reason being that not only can we validate the structure of data, but we also have control over things if they look fishy.
I think a well-curated and stable leaderboard is probably better to have, even if it means that we have to go through result submissions.
This is a place where loads of people go to choose models for their use case, so I don't mind sacrificing time to keep things clean and reputable.
I remember, that there were quite a few issues over this in the openllm leaderboard, as models only got flagged after the fact, when the damage was already done, and even today most of the top models there seem to be of dubious quality and origin.

Aside from this, not all models will have model cards (e.g. proprietary ones won't), so we'd have to keep track of this in two places at once.

Muennighoff · 2024-11-04T18:19:15Z

Great points, maybe let's edit the README here https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md to say that the preferred way for submissions is via PR to embedding-benchmark/results? (but leave the old way intact for now)

Another advantage of this is that we get the full result files with more metadata

orionw · 2024-11-04T18:53:25Z

+1 to everything.

If at some point we want to switch over entirely, we could convert the metadata on HF pages into files we can PR into the results repo.

KennethEnevoldsen · 2024-11-04T19:43:27Z

I agree with everything here. The current benchmark does not support the metadata files from HF. We can either have it fetch from the model metadata from HF or add CI to embedding-benchmark/results to do it. I am fine with either (whatever is the easiest to implement).

x-tabdeveloping · 2024-11-05T07:58:49Z

It would be great if we at least transfered the results from the old leaderboard to embedding-benchmark/results so that the old leaderboard can be used out of the box within the new one.

Samoed · 2024-11-05T08:34:04Z

I'll try to collect all available scores

Samoed · 2024-11-09T18:15:45Z

Add reference for visibility embeddings-benchmark/results#43

x-tabdeveloping · 2024-11-13T15:48:57Z

Moved the discussion to the referenced (embeddings-benchmark/results#43)

KennethEnevoldsen changed the title ~~Leaderboard 2.0: should we fetch data from model meta~~ Leaderboard 2.0: should we fetch data from model card? Nov 1, 2024

isaac-chung added the leaderboard issues related to the leaderboard label Nov 9, 2024

This was referenced Nov 9, 2024

Fetch models results from HF embeddings-benchmark/results#43

Open

Overview issue: Leaderboard 2.0 release #1405

Open

x-tabdeveloping closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaderboard 2.0: should we fetch data from model card? #1373

Leaderboard 2.0: should we fetch data from model card? #1373

KennethEnevoldsen commented Nov 1, 2024

isaac-chung commented Nov 1, 2024

x-tabdeveloping commented Nov 4, 2024

Muennighoff commented Nov 4, 2024

orionw commented Nov 4, 2024

KennethEnevoldsen commented Nov 4, 2024

x-tabdeveloping commented Nov 5, 2024

Samoed commented Nov 5, 2024

Samoed commented Nov 9, 2024

x-tabdeveloping commented Nov 13, 2024

Leaderboard 2.0: should we fetch data from model card? #1373

Leaderboard 2.0: should we fetch data from model card? #1373

Comments

KennethEnevoldsen commented Nov 1, 2024

isaac-chung commented Nov 1, 2024

x-tabdeveloping commented Nov 4, 2024

Muennighoff commented Nov 4, 2024

orionw commented Nov 4, 2024

KennethEnevoldsen commented Nov 4, 2024

x-tabdeveloping commented Nov 5, 2024

Samoed commented Nov 5, 2024

Samoed commented Nov 9, 2024

x-tabdeveloping commented Nov 13, 2024