Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve leaderboard 2.0 readability #1317

Open
7 tasks done
KennethEnevoldsen opened this issue Oct 24, 2024 · 15 comments
Open
7 tasks done

Improve leaderboard 2.0 readability #1317

KennethEnevoldsen opened this issue Oct 24, 2024 · 15 comments
Labels
leaderboard issues related to the leaderboard

Comments

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented Oct 24, 2024

A couple of comments for readability:

Originally posted by @KennethEnevoldsen in #1312 (comment)

@x-tabdeveloping
Copy link
Collaborator

x-tabdeveloping commented Oct 24, 2024

By fold-down menu you mean an accordion right?

@KennethEnevoldsen
Copy link
Contributor Author

It was my initial idea yes, but I suppose multiple things could work - tabs would also be an option:

Screenshot 2024-10-24 at 13 51 54

@x-tabdeveloping
Copy link
Collaborator

That ain't dumb! might try that one then.

@orionw
Copy link
Contributor

orionw commented Oct 24, 2024

multiply scores by 100 and keep one decimal, e.g. 78.1 (@orionw not sure if this also works for followIR?)

It does work for FollowIR!

Also is the v2 leaderboard up somewhere or is this a picture from development?

@x-tabdeveloping
Copy link
Collaborator

It's still in development. I'm using the leaderboard_2, brnach for new changes. You can run it by:

from mteb.leaderboard import demo

demo.launch()

@x-tabdeveloping
Copy link
Collaborator

I can host a demo version on my HF profile btw if it's something we'd be interested in having @orionw

@orionw
Copy link
Contributor

orionw commented Oct 25, 2024

Ah, no problem @x-tabdeveloping! For some reason I misunderstood and thought it was already up. Thanks for the offer, but no need to add extra work during your development. It’s looking great already though! 🚀

@x-tabdeveloping
Copy link
Collaborator

Here's a demo of the current version: https://huggingface.co/spaces/kardosdrur/mmteb_leaderboard_demo

@tomaarsen
Copy link
Member

Thanks for sharing the dev version!

@Muennighoff
Copy link
Contributor

Muennighoff commented Nov 8, 2024

The leaderboard looks really amazing! Probably already planned but

  • some indication of contamination would be great as we discussed (maybe we just manually add it to the metadata for now where we know it (e.g. trained_on_{task_name}_{task_split}: true or training_datasets: [(Emotion, train), (Amazon, test), ...] or something else) and invite users to update the metadata via PR)
  • maybe adding some statistics, e.g. on the current LB we have the below at the bottom
Total Datasets: 213
Total Languages: 113
Total Scores: 88857
Total Models: 469

(could be auto-displayed per-benchmark when selecting a benchmark)

  • Maybe we want to link it with the arena somehow? (e.g. one dropdown option could be the arena and it links to the arena space; or we just have a banner at the bottom or top to motivate people to checkout the arena or similar; you probably have better ideas!)

@isaac-chung isaac-chung added the leaderboard issues related to the leaderboard label Nov 9, 2024
@x-tabdeveloping
Copy link
Collaborator

@Muennighoff I'm on it!

@x-tabdeveloping
Copy link
Collaborator

Hey @Muennighoff what does Total scores mean?

@Muennighoff
Copy link
Contributor

Total scores is the total number of scores i.e. how many numbers there are in the table. Maybe there's a better name for it 🤔

@KennethEnevoldsen
Copy link
Contributor Author

Might be worth moving integration with Arena to a separate issue (It might work well with #1432). I think it might warrant some more discussion. To begin with we could also add it to the description of MTEB(eng, beta). Something like:

"English also has an arena-style benchmark for evaluating embeddings. You can check this out here".

@x-tabdeveloping
Copy link
Collaborator

I'm a bit stopped in my tracks because of glaring issues with Gradio's dataframes (1, 2). I have implemented the plot though, and will add overview info to the benchmarks' descriptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leaderboard issues related to the leaderboard
Projects
None yet
Development

No branches or pull requests

6 participants