-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add Russian models #21
fix: Add Russian models #21
Conversation
# Conflicts: # all_data_tasks/0/default.jsonl # all_data_tasks/1/default.jsonl # all_data_tasks/2/default.jsonl # all_data_tasks/3/default.jsonl # all_data_tasks/4/default.jsonl # all_data_tasks/5/default.jsonl # refresh.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool work! 🚀 Somehow your leaderboard removes a lot of models for me - Also the changes in the cached results indicate that lots of models get removed. Can you investigate what's happening & fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry was a bit fast with the review there - we did a PR yesterday which revealed that a lot of leaderboards weren't updated so it is possible that it could have been that one (or one of the ones before that - hard to know when they weren't updated)
refresh.py
Outdated
@@ -538,13 +553,14 @@ def get_mteb_average(task_dict: dict) -> tuple[Any, dict]: | |||
DATA_OVERALL.insert( | |||
1, | |||
f"Average ({len(all_tasks)} datasets)", | |||
DATA_OVERALL[all_tasks].mean(axis=1, skipna=False), | |||
DATA_OVERALL[find_tasks(DATA_OVERALL.columns, all_tasks)].mean(axis=1, skipna=False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was there something wrong beforehand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that previously tasks were presented with lang subset in config.yaml
. So, I think I should change this too
refresh.py
Outdated
@@ -508,7 +509,7 @@ def get_mteb_data( | |||
df.drop(columns=["PawsX (fr)"], inplace=True) | |||
|
|||
# Filter invalid columns | |||
cols = [col for col in cols if col in base_columns + datasets] | |||
cols = [col for col in cols if col in base_columns + datasets or any([col.split()[0] == d for d in datasets])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure what happens here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar problem to find_tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I believe that what is in the name of the task in the huggingface split (not language) and I believe it shoudl be there for all datasets unless it is the default subset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to make without find_task a bit later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might also be that I am missing something
refresh.py
Outdated
@@ -136,8 +137,8 @@ def add_lang(examples): | |||
return examples | |||
|
|||
|
|||
def norm(names: str) -> set: | |||
return set([name.split(" ")[0] for name in names]) | |||
def norm(names: list[str]) -> list[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't it be a set?
refresh.py
Outdated
@@ -659,8 +675,7 @@ def write_out_results(item: dict, item_name: str) -> None: | |||
print(f"Saving {main_folder} to {main_folder}/default.jsonl") | |||
os.makedirs(main_folder, exist_ok=True) | |||
|
|||
item.reset_index(inplace=True) | |||
item.to_json(f"{main_folder}/default.jsonl", orient="records", lines=True) | |||
item.reset_index(drop=True).to_json(f"{main_folder}/default.jsonl", orient="records", lines=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the cause for removing a lot of examples. drop=True will remove items due to them having the same index (e.g. if you concat two data frames where both start their index at 1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs it shouldn't add the index to the json when orient=record. So it might be that the column was accidentally added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When done reset_index it becomes additional column Unnamed: 0
and exoprted to_json, so maybe we shouldn't reset_index before exoprt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right. I believes it gives an error otherwise (but if not just remove it). Otherwise drop the column before writing it
# Conflicts: # all_data_tasks/0/default.jsonl # all_data_tasks/1/default.jsonl # all_data_tasks/10/default.jsonl # all_data_tasks/11/default.jsonl # all_data_tasks/12/default.jsonl # all_data_tasks/13/default.jsonl # all_data_tasks/14/default.jsonl # all_data_tasks/15/default.jsonl # all_data_tasks/16/default.jsonl # all_data_tasks/17/default.jsonl # all_data_tasks/18/default.jsonl # all_data_tasks/19/default.jsonl # all_data_tasks/2/default.jsonl # all_data_tasks/20/default.jsonl # all_data_tasks/21/default.jsonl # all_data_tasks/22/default.jsonl # all_data_tasks/23/default.jsonl # all_data_tasks/25/default.jsonl # all_data_tasks/27/default.jsonl # all_data_tasks/28/default.jsonl # all_data_tasks/29/default.jsonl # all_data_tasks/3/default.jsonl # all_data_tasks/30/default.jsonl # all_data_tasks/31/default.jsonl # all_data_tasks/32/default.jsonl # all_data_tasks/33/default.jsonl # all_data_tasks/34/default.jsonl # all_data_tasks/35/default.jsonl # all_data_tasks/36/default.jsonl # all_data_tasks/4/default.jsonl # all_data_tasks/5/default.jsonl # all_data_tasks/6/default.jsonl # all_data_tasks/7/default.jsonl # all_data_tasks/8/default.jsonl # all_data_tasks/9/default.jsonl # boards_data/bright/data_tasks/Retrieval/default.jsonl # boards_data/da/data_tasks/BitextMining/default.jsonl # boards_data/da/data_tasks/Classification/default.jsonl # boards_data/de/data_tasks/Clustering/default.jsonl # boards_data/en-x/data_tasks/BitextMining/default.jsonl # boards_data/en/data_overall/default.jsonl # boards_data/en/data_tasks/Classification/default.jsonl # boards_data/en/data_tasks/Clustering/default.jsonl # boards_data/en/data_tasks/PairClassification/default.jsonl # boards_data/en/data_tasks/Reranking/default.jsonl # boards_data/en/data_tasks/Retrieval/default.jsonl # boards_data/en/data_tasks/STS/default.jsonl # boards_data/en/data_tasks/Summarization/default.jsonl # boards_data/fr/data_overall/default.jsonl # boards_data/fr/data_tasks/Classification/default.jsonl # boards_data/fr/data_tasks/Clustering/default.jsonl # boards_data/fr/data_tasks/PairClassification/default.jsonl # boards_data/fr/data_tasks/Reranking/default.jsonl # boards_data/fr/data_tasks/Retrieval/default.jsonl # boards_data/fr/data_tasks/STS/default.jsonl # boards_data/fr/data_tasks/Summarization/default.jsonl # boards_data/instructions/data_tasks/InstructionRetrieval/default.jsonl # boards_data/law/data_tasks/Retrieval/default.jsonl # boards_data/longembed/data_tasks/Retrieval/default.jsonl # boards_data/no/data_tasks/Classification/default.jsonl # boards_data/other-cls/data_tasks/Classification/default.jsonl # boards_data/other-sts/data_tasks/STS/default.jsonl # boards_data/pl/data_overall/default.jsonl # boards_data/pl/data_tasks/Classification/default.jsonl # boards_data/pl/data_tasks/Clustering/default.jsonl # boards_data/pl/data_tasks/PairClassification/default.jsonl # boards_data/pl/data_tasks/Retrieval/default.jsonl # boards_data/pl/data_tasks/STS/default.jsonl # boards_data/rar-b/data_tasks/Retrieval/default.jsonl # boards_data/se/data_tasks/Classification/default.jsonl # boards_data/zh/data_overall/default.jsonl # boards_data/zh/data_tasks/Classification/default.jsonl # boards_data/zh/data_tasks/Clustering/default.jsonl # boards_data/zh/data_tasks/PairClassification/default.jsonl # boards_data/zh/data_tasks/Reranking/default.jsonl # boards_data/zh/data_tasks/Retrieval/default.jsonl # boards_data/zh/data_tasks/STS/default.jsonl # model_meta.yaml # refresh.py
# Conflicts: # all_data_tasks/0/default.jsonl # all_data_tasks/1/default.jsonl # all_data_tasks/10/default.jsonl # all_data_tasks/11/default.jsonl # all_data_tasks/12/default.jsonl # all_data_tasks/13/default.jsonl # all_data_tasks/16/default.jsonl # all_data_tasks/17/default.jsonl # all_data_tasks/18/default.jsonl # all_data_tasks/19/default.jsonl # all_data_tasks/2/default.jsonl # all_data_tasks/20/default.jsonl # all_data_tasks/21/default.jsonl # all_data_tasks/22/default.jsonl # all_data_tasks/28/default.jsonl # all_data_tasks/29/default.jsonl # all_data_tasks/3/default.jsonl # all_data_tasks/30/default.jsonl # all_data_tasks/31/default.jsonl # all_data_tasks/32/default.jsonl # all_data_tasks/34/default.jsonl # all_data_tasks/35/default.jsonl # all_data_tasks/4/default.jsonl # all_data_tasks/5/default.jsonl # all_data_tasks/6/default.jsonl # all_data_tasks/8/default.jsonl # all_data_tasks/9/default.jsonl # boards_data/en/data_overall/default.jsonl # boards_data/en/data_tasks/Classification/default.jsonl # boards_data/en/data_tasks/Clustering/default.jsonl # boards_data/en/data_tasks/PairClassification/default.jsonl # boards_data/en/data_tasks/Reranking/default.jsonl # boards_data/en/data_tasks/Retrieval/default.jsonl # boards_data/en/data_tasks/STS/default.jsonl # boards_data/en/data_tasks/Summarization/default.jsonl # boards_data/fr/data_overall/default.jsonl # boards_data/fr/data_tasks/Classification/default.jsonl # boards_data/fr/data_tasks/Clustering/default.jsonl # boards_data/fr/data_tasks/PairClassification/default.jsonl # boards_data/fr/data_tasks/Reranking/default.jsonl # boards_data/fr/data_tasks/Retrieval/default.jsonl # boards_data/fr/data_tasks/STS/default.jsonl # boards_data/fr/data_tasks/Summarization/default.jsonl # boards_data/other-cls/data_tasks/Classification/default.jsonl # boards_data/other-sts/data_tasks/STS/default.jsonl # boards_data/pl/data_overall/default.jsonl # boards_data/pl/data_tasks/Classification/default.jsonl # boards_data/pl/data_tasks/Clustering/default.jsonl # boards_data/pl/data_tasks/PairClassification/default.jsonl # boards_data/pl/data_tasks/Retrieval/default.jsonl # boards_data/pl/data_tasks/STS/default.jsonl # boards_data/zh/data_overall/default.jsonl # boards_data/zh/data_tasks/Classification/default.jsonl # boards_data/zh/data_tasks/Clustering/default.jsonl # boards_data/zh/data_tasks/PairClassification/default.jsonl # boards_data/zh/data_tasks/Reranking/default.jsonl # boards_data/zh/data_tasks/Retrieval/default.jsonl # boards_data/zh/data_tasks/STS/default.jsonl # model_meta.yaml
# Conflicts: # EXTERNAL_MODEL_RESULTS.json # all_data_tasks/0/default.jsonl # all_data_tasks/1/default.jsonl # all_data_tasks/10/default.jsonl # all_data_tasks/11/default.jsonl # all_data_tasks/12/default.jsonl # all_data_tasks/13/default.jsonl # all_data_tasks/14/default.jsonl # all_data_tasks/15/default.jsonl # all_data_tasks/16/default.jsonl # all_data_tasks/17/default.jsonl # all_data_tasks/18/default.jsonl # all_data_tasks/19/default.jsonl # all_data_tasks/2/default.jsonl # all_data_tasks/20/default.jsonl # all_data_tasks/21/default.jsonl # all_data_tasks/22/default.jsonl # all_data_tasks/23/default.jsonl # all_data_tasks/25/default.jsonl # all_data_tasks/26/default.jsonl # all_data_tasks/28/default.jsonl # all_data_tasks/29/default.jsonl # all_data_tasks/3/default.jsonl # all_data_tasks/30/default.jsonl # all_data_tasks/31/default.jsonl # all_data_tasks/32/default.jsonl # all_data_tasks/33/default.jsonl # all_data_tasks/35/default.jsonl # all_data_tasks/36/default.jsonl # all_data_tasks/4/default.jsonl # all_data_tasks/5/default.jsonl # all_data_tasks/6/default.jsonl # all_data_tasks/8/default.jsonl # all_data_tasks/9/default.jsonl # boards_data/da/data_tasks/BitextMining/default.jsonl # boards_data/da/data_tasks/Classification/default.jsonl # boards_data/en/data_overall/default.jsonl # boards_data/en/data_tasks/Classification/default.jsonl # boards_data/en/data_tasks/Clustering/default.jsonl # boards_data/en/data_tasks/PairClassification/default.jsonl # boards_data/en/data_tasks/Reranking/default.jsonl # boards_data/en/data_tasks/Retrieval/default.jsonl # boards_data/en/data_tasks/STS/default.jsonl # boards_data/en/data_tasks/Summarization/default.jsonl # boards_data/fr/data_overall/default.jsonl # boards_data/fr/data_tasks/Classification/default.jsonl # boards_data/fr/data_tasks/Clustering/default.jsonl # boards_data/fr/data_tasks/PairClassification/default.jsonl # boards_data/fr/data_tasks/Reranking/default.jsonl # boards_data/fr/data_tasks/Retrieval/default.jsonl # boards_data/fr/data_tasks/STS/default.jsonl # boards_data/fr/data_tasks/Summarization/default.jsonl # boards_data/law/data_tasks/Retrieval/default.jsonl # boards_data/longembed/data_tasks/Retrieval/default.jsonl # boards_data/no/data_tasks/Classification/default.jsonl # boards_data/other-sts/data_tasks/STS/default.jsonl # boards_data/pl/data_overall/default.jsonl # boards_data/pl/data_tasks/Classification/default.jsonl # boards_data/pl/data_tasks/Clustering/default.jsonl # boards_data/pl/data_tasks/PairClassification/default.jsonl # boards_data/pl/data_tasks/Retrieval/default.jsonl # boards_data/pl/data_tasks/STS/default.jsonl # boards_data/rar-b/data_tasks/Retrieval/default.jsonl # boards_data/se/data_tasks/Classification/default.jsonl # boards_data/zh/data_overall/default.jsonl # boards_data/zh/data_tasks/Classification/default.jsonl # boards_data/zh/data_tasks/Clustering/default.jsonl # boards_data/zh/data_tasks/PairClassification/default.jsonl # boards_data/zh/data_tasks/Reranking/default.jsonl # boards_data/zh/data_tasks/Retrieval/default.jsonl # boards_data/zh/data_tasks/STS/default.jsonl
# Conflicts: # EXTERNAL_MODEL_RESULTS.json # all_data_tasks/0/default.jsonl # all_data_tasks/1/default.jsonl # all_data_tasks/10/default.jsonl # all_data_tasks/11/default.jsonl # all_data_tasks/12/default.jsonl # all_data_tasks/13/default.jsonl # all_data_tasks/15/default.jsonl # all_data_tasks/16/default.jsonl # all_data_tasks/17/default.jsonl # all_data_tasks/18/default.jsonl # all_data_tasks/19/default.jsonl # all_data_tasks/2/default.jsonl # all_data_tasks/20/default.jsonl # all_data_tasks/21/default.jsonl # all_data_tasks/22/default.jsonl # all_data_tasks/23/default.jsonl # all_data_tasks/28/default.jsonl # all_data_tasks/29/default.jsonl # all_data_tasks/3/default.jsonl # all_data_tasks/30/default.jsonl # all_data_tasks/31/default.jsonl # all_data_tasks/32/default.jsonl # all_data_tasks/33/default.jsonl # all_data_tasks/34/default.jsonl # all_data_tasks/35/default.jsonl # all_data_tasks/36/default.jsonl # all_data_tasks/4/default.jsonl # all_data_tasks/5/default.jsonl # all_data_tasks/6/default.jsonl # all_data_tasks/8/default.jsonl # all_data_tasks/9/default.jsonl # boards_data/da/data_tasks/Classification/default.jsonl # boards_data/en/data_overall/default.jsonl # boards_data/en/data_tasks/Classification/default.jsonl # boards_data/en/data_tasks/Clustering/default.jsonl # boards_data/en/data_tasks/PairClassification/default.jsonl # boards_data/en/data_tasks/Reranking/default.jsonl # boards_data/en/data_tasks/Retrieval/default.jsonl # boards_data/en/data_tasks/STS/default.jsonl # boards_data/en/data_tasks/Summarization/default.jsonl # boards_data/fr/data_overall/default.jsonl # boards_data/fr/data_tasks/Classification/default.jsonl # boards_data/fr/data_tasks/Clustering/default.jsonl # boards_data/fr/data_tasks/PairClassification/default.jsonl # boards_data/fr/data_tasks/Reranking/default.jsonl # boards_data/fr/data_tasks/Retrieval/default.jsonl # boards_data/fr/data_tasks/STS/default.jsonl # boards_data/fr/data_tasks/Summarization/default.jsonl # boards_data/no/data_tasks/Classification/default.jsonl # boards_data/other-cls/data_tasks/Classification/default.jsonl # boards_data/other-sts/data_tasks/STS/default.jsonl # boards_data/pl/data_overall/default.jsonl # boards_data/pl/data_tasks/Classification/default.jsonl # boards_data/pl/data_tasks/Clustering/default.jsonl # boards_data/pl/data_tasks/PairClassification/default.jsonl # boards_data/pl/data_tasks/Retrieval/default.jsonl # boards_data/pl/data_tasks/STS/default.jsonl # boards_data/rar-b/data_tasks/Retrieval/default.jsonl # boards_data/se/data_tasks/Classification/default.jsonl # boards_data/zh/data_overall/default.jsonl # boards_data/zh/data_tasks/Classification/default.jsonl # boards_data/zh/data_tasks/Clustering/default.jsonl # boards_data/zh/data_tasks/PairClassification/default.jsonl # boards_data/zh/data_tasks/Reranking/default.jsonl # boards_data/zh/data_tasks/Retrieval/default.jsonl # boards_data/zh/data_tasks/STS/default.jsonl
@KennethEnevoldsen @Muennighoff Can you take a look at PR, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor changes otherwise I believe it looks reasonable.
all_data_tasks/0/default.jsonl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no need to change these files (it is done during the CI). Would avoid pushing them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge them after leaderboard update. I was checking how everything was working
@Muennighoff will you have the time to review this as well to ensure that we don't break the leaderboard? |
I've created russian benchmark boards and added results from embeddings-benchmark/results#11