Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaderboard 2.0: should we fetch data from model card? #1373

Closed
Tracked by #1405
KennethEnevoldsen opened this issue Nov 1, 2024 · 9 comments
Closed
Tracked by #1405

Leaderboard 2.0: should we fetch data from model card? #1373

KennethEnevoldsen opened this issue Nov 1, 2024 · 9 comments
Labels
leaderboard issues related to the leaderboard

Comments

@KennethEnevoldsen
Copy link
Contributor

Currently, model metadata files become quite extensive if they cover all tasks in mteb (#1368). This makes it frustrating for users to see what scores a model has (having scores for >30 datasets doesn't really give a great overview). The ideal solution would probably be to either:

  1. Allow users to upload an mteb_results/* folder
  2. Only accept results pushed to embedding-benchmark/results

Let me know what you guys think (@Muennighoff, @imenelydiaker, @isaac-chung, @orionw, @x-tabdeveloping)

@KennethEnevoldsen KennethEnevoldsen changed the title Leaderboard 2.0: should we fetch data from model meta Leaderboard 2.0: should we fetch data from model card? Nov 1, 2024
@isaac-chung
Copy link
Collaborator

It's not like they are blocked from doing so, right? One option was to use git-lfs. I see some pros and cons of this suggestion:

  • pros:
    • we can validate the scores before accepting them, e.g. in terms of structure
    • single source of truth
  • cons:
    • additional step for users to submit to the leaderboard
    • additional items for maintainers to review: though this can be alleviated by automating checks.

@x-tabdeveloping
Copy link
Collaborator

I'd definitely prefer keeping it in one place (embedding-benchmark/results). The reason being that not only can we validate the structure of data, but we also have control over things if they look fishy.
I think a well-curated and stable leaderboard is probably better to have, even if it means that we have to go through result submissions.
This is a place where loads of people go to choose models for their use case, so I don't mind sacrificing time to keep things clean and reputable.
I remember, that there were quite a few issues over this in the openllm leaderboard, as models only got flagged after the fact, when the damage was already done, and even today most of the top models there seem to be of dubious quality and origin.

Aside from this, not all models will have model cards (e.g. proprietary ones won't), so we'd have to keep track of this in two places at once.

@Muennighoff
Copy link
Contributor

Great points, maybe let's edit the README here https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md to say that the preferred way for submissions is via PR to embedding-benchmark/results? (but leave the old way intact for now)

Another advantage of this is that we get the full result files with more metadata

@orionw
Copy link
Contributor

orionw commented Nov 4, 2024

+1 to everything.

If at some point we want to switch over entirely, we could convert the metadata on HF pages into files we can PR into the results repo.

@KennethEnevoldsen
Copy link
Contributor Author

I agree with everything here. The current benchmark does not support the metadata files from HF. We can either have it fetch from the model metadata from HF or add CI to embedding-benchmark/results to do it. I am fine with either (whatever is the easiest to implement).

@x-tabdeveloping
Copy link
Collaborator

It would be great if we at least transfered the results from the old leaderboard to embedding-benchmark/results so that the old leaderboard can be used out of the box within the new one.

@Samoed
Copy link
Collaborator

Samoed commented Nov 5, 2024

I'll try to collect all available scores

@Samoed
Copy link
Collaborator

Samoed commented Nov 9, 2024

Add reference for visibility embeddings-benchmark/results#43

@x-tabdeveloping
Copy link
Collaborator

Moved the discussion to the referenced (embeddings-benchmark/results#43)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leaderboard issues related to the leaderboard
Projects
None yet
Development

No branches or pull requests

6 participants