Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add Russian models #21

Merged
merged 14 commits into from
Aug 6, 2024

Conversation

Samoed
Copy link
Contributor

@Samoed Samoed commented Jul 28, 2024

I've created russian benchmark boards and added results from embeddings-benchmark/results#11

# Conflicts:
#	all_data_tasks/0/default.jsonl
#	all_data_tasks/1/default.jsonl
#	all_data_tasks/2/default.jsonl
#	all_data_tasks/3/default.jsonl
#	all_data_tasks/4/default.jsonl
#	all_data_tasks/5/default.jsonl
#	refresh.py
Copy link
Contributor

@Muennighoff Muennighoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool work! 🚀 Somehow your leaderboard removes a lot of models for me - Also the changes in the cached results indicate that lots of models get removed. Can you investigate what's happening & fix it?

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry was a bit fast with the review there - we did a PR yesterday which revealed that a lot of leaderboards weren't updated so it is possible that it could have been that one (or one of the ones before that - hard to know when they weren't updated)

refresh.py Outdated
@@ -538,13 +553,14 @@ def get_mteb_average(task_dict: dict) -> tuple[Any, dict]:
DATA_OVERALL.insert(
1,
f"Average ({len(all_tasks)} datasets)",
DATA_OVERALL[all_tasks].mean(axis=1, skipna=False),
DATA_OVERALL[find_tasks(DATA_OVERALL.columns, all_tasks)].mean(axis=1, skipna=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there something wrong beforehand?

Copy link
Contributor Author

@Samoed Samoed Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that previously tasks were presented with lang subset in config.yaml. So, I think I should change this too

refresh.py Outdated
@@ -508,7 +509,7 @@ def get_mteb_data(
df.drop(columns=["PawsX (fr)"], inplace=True)

# Filter invalid columns
cols = [col for col in cols if col in base_columns + datasets]
cols = [col for col in cols if col in base_columns + datasets or any([col.split()[0] == d for d in datasets])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure what happens here?

Copy link
Contributor Author

@Samoed Samoed Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar problem to find_tasks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I believe that what is in the name of the task in the huggingface split (not language) and I believe it shoudl be there for all datasets unless it is the default subset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to make without find_task a bit later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might also be that I am missing something

refresh.py Outdated
@@ -136,8 +137,8 @@ def add_lang(examples):
return examples


def norm(names: str) -> set:
return set([name.split(" ")[0] for name in names])
def norm(names: list[str]) -> list[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't it be a set?

refresh.py Outdated
@@ -659,8 +675,7 @@ def write_out_results(item: dict, item_name: str) -> None:
print(f"Saving {main_folder} to {main_folder}/default.jsonl")
os.makedirs(main_folder, exist_ok=True)

item.reset_index(inplace=True)
item.to_json(f"{main_folder}/default.jsonl", orient="records", lines=True)
item.reset_index(drop=True).to_json(f"{main_folder}/default.jsonl", orient="records", lines=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the cause for removing a lot of examples. drop=True will remove items due to them having the same index (e.g. if you concat two data frames where both start their index at 1).

Copy link
Contributor Author

@Samoed Samoed Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I was trying to remove, Index_0 from result tables. I'll try to make this in to_json
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs it shouldn't add the index to the json when orient=record. So it might be that the column was accidentally added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When done reset_index it becomes additional column Unnamed: 0 and exoprted to_json, so maybe we shouldn't reset_index before exoprt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right. I believes it gives an error otherwise (but if not just remove it). Otherwise drop the column before writing it

# Conflicts:
#	all_data_tasks/0/default.jsonl
#	all_data_tasks/1/default.jsonl
#	all_data_tasks/10/default.jsonl
#	all_data_tasks/11/default.jsonl
#	all_data_tasks/12/default.jsonl
#	all_data_tasks/13/default.jsonl
#	all_data_tasks/14/default.jsonl
#	all_data_tasks/15/default.jsonl
#	all_data_tasks/16/default.jsonl
#	all_data_tasks/17/default.jsonl
#	all_data_tasks/18/default.jsonl
#	all_data_tasks/19/default.jsonl
#	all_data_tasks/2/default.jsonl
#	all_data_tasks/20/default.jsonl
#	all_data_tasks/21/default.jsonl
#	all_data_tasks/22/default.jsonl
#	all_data_tasks/23/default.jsonl
#	all_data_tasks/25/default.jsonl
#	all_data_tasks/27/default.jsonl
#	all_data_tasks/28/default.jsonl
#	all_data_tasks/29/default.jsonl
#	all_data_tasks/3/default.jsonl
#	all_data_tasks/30/default.jsonl
#	all_data_tasks/31/default.jsonl
#	all_data_tasks/32/default.jsonl
#	all_data_tasks/33/default.jsonl
#	all_data_tasks/34/default.jsonl
#	all_data_tasks/35/default.jsonl
#	all_data_tasks/36/default.jsonl
#	all_data_tasks/4/default.jsonl
#	all_data_tasks/5/default.jsonl
#	all_data_tasks/6/default.jsonl
#	all_data_tasks/7/default.jsonl
#	all_data_tasks/8/default.jsonl
#	all_data_tasks/9/default.jsonl
#	boards_data/bright/data_tasks/Retrieval/default.jsonl
#	boards_data/da/data_tasks/BitextMining/default.jsonl
#	boards_data/da/data_tasks/Classification/default.jsonl
#	boards_data/de/data_tasks/Clustering/default.jsonl
#	boards_data/en-x/data_tasks/BitextMining/default.jsonl
#	boards_data/en/data_overall/default.jsonl
#	boards_data/en/data_tasks/Classification/default.jsonl
#	boards_data/en/data_tasks/Clustering/default.jsonl
#	boards_data/en/data_tasks/PairClassification/default.jsonl
#	boards_data/en/data_tasks/Reranking/default.jsonl
#	boards_data/en/data_tasks/Retrieval/default.jsonl
#	boards_data/en/data_tasks/STS/default.jsonl
#	boards_data/en/data_tasks/Summarization/default.jsonl
#	boards_data/fr/data_overall/default.jsonl
#	boards_data/fr/data_tasks/Classification/default.jsonl
#	boards_data/fr/data_tasks/Clustering/default.jsonl
#	boards_data/fr/data_tasks/PairClassification/default.jsonl
#	boards_data/fr/data_tasks/Reranking/default.jsonl
#	boards_data/fr/data_tasks/Retrieval/default.jsonl
#	boards_data/fr/data_tasks/STS/default.jsonl
#	boards_data/fr/data_tasks/Summarization/default.jsonl
#	boards_data/instructions/data_tasks/InstructionRetrieval/default.jsonl
#	boards_data/law/data_tasks/Retrieval/default.jsonl
#	boards_data/longembed/data_tasks/Retrieval/default.jsonl
#	boards_data/no/data_tasks/Classification/default.jsonl
#	boards_data/other-cls/data_tasks/Classification/default.jsonl
#	boards_data/other-sts/data_tasks/STS/default.jsonl
#	boards_data/pl/data_overall/default.jsonl
#	boards_data/pl/data_tasks/Classification/default.jsonl
#	boards_data/pl/data_tasks/Clustering/default.jsonl
#	boards_data/pl/data_tasks/PairClassification/default.jsonl
#	boards_data/pl/data_tasks/Retrieval/default.jsonl
#	boards_data/pl/data_tasks/STS/default.jsonl
#	boards_data/rar-b/data_tasks/Retrieval/default.jsonl
#	boards_data/se/data_tasks/Classification/default.jsonl
#	boards_data/zh/data_overall/default.jsonl
#	boards_data/zh/data_tasks/Classification/default.jsonl
#	boards_data/zh/data_tasks/Clustering/default.jsonl
#	boards_data/zh/data_tasks/PairClassification/default.jsonl
#	boards_data/zh/data_tasks/Reranking/default.jsonl
#	boards_data/zh/data_tasks/Retrieval/default.jsonl
#	boards_data/zh/data_tasks/STS/default.jsonl
#	model_meta.yaml
#	refresh.py
# Conflicts:
#	all_data_tasks/0/default.jsonl
#	all_data_tasks/1/default.jsonl
#	all_data_tasks/10/default.jsonl
#	all_data_tasks/11/default.jsonl
#	all_data_tasks/12/default.jsonl
#	all_data_tasks/13/default.jsonl
#	all_data_tasks/16/default.jsonl
#	all_data_tasks/17/default.jsonl
#	all_data_tasks/18/default.jsonl
#	all_data_tasks/19/default.jsonl
#	all_data_tasks/2/default.jsonl
#	all_data_tasks/20/default.jsonl
#	all_data_tasks/21/default.jsonl
#	all_data_tasks/22/default.jsonl
#	all_data_tasks/28/default.jsonl
#	all_data_tasks/29/default.jsonl
#	all_data_tasks/3/default.jsonl
#	all_data_tasks/30/default.jsonl
#	all_data_tasks/31/default.jsonl
#	all_data_tasks/32/default.jsonl
#	all_data_tasks/34/default.jsonl
#	all_data_tasks/35/default.jsonl
#	all_data_tasks/4/default.jsonl
#	all_data_tasks/5/default.jsonl
#	all_data_tasks/6/default.jsonl
#	all_data_tasks/8/default.jsonl
#	all_data_tasks/9/default.jsonl
#	boards_data/en/data_overall/default.jsonl
#	boards_data/en/data_tasks/Classification/default.jsonl
#	boards_data/en/data_tasks/Clustering/default.jsonl
#	boards_data/en/data_tasks/PairClassification/default.jsonl
#	boards_data/en/data_tasks/Reranking/default.jsonl
#	boards_data/en/data_tasks/Retrieval/default.jsonl
#	boards_data/en/data_tasks/STS/default.jsonl
#	boards_data/en/data_tasks/Summarization/default.jsonl
#	boards_data/fr/data_overall/default.jsonl
#	boards_data/fr/data_tasks/Classification/default.jsonl
#	boards_data/fr/data_tasks/Clustering/default.jsonl
#	boards_data/fr/data_tasks/PairClassification/default.jsonl
#	boards_data/fr/data_tasks/Reranking/default.jsonl
#	boards_data/fr/data_tasks/Retrieval/default.jsonl
#	boards_data/fr/data_tasks/STS/default.jsonl
#	boards_data/fr/data_tasks/Summarization/default.jsonl
#	boards_data/other-cls/data_tasks/Classification/default.jsonl
#	boards_data/other-sts/data_tasks/STS/default.jsonl
#	boards_data/pl/data_overall/default.jsonl
#	boards_data/pl/data_tasks/Classification/default.jsonl
#	boards_data/pl/data_tasks/Clustering/default.jsonl
#	boards_data/pl/data_tasks/PairClassification/default.jsonl
#	boards_data/pl/data_tasks/Retrieval/default.jsonl
#	boards_data/pl/data_tasks/STS/default.jsonl
#	boards_data/zh/data_overall/default.jsonl
#	boards_data/zh/data_tasks/Classification/default.jsonl
#	boards_data/zh/data_tasks/Clustering/default.jsonl
#	boards_data/zh/data_tasks/PairClassification/default.jsonl
#	boards_data/zh/data_tasks/Reranking/default.jsonl
#	boards_data/zh/data_tasks/Retrieval/default.jsonl
#	boards_data/zh/data_tasks/STS/default.jsonl
#	model_meta.yaml
# Conflicts:
#	EXTERNAL_MODEL_RESULTS.json
#	all_data_tasks/0/default.jsonl
#	all_data_tasks/1/default.jsonl
#	all_data_tasks/10/default.jsonl
#	all_data_tasks/11/default.jsonl
#	all_data_tasks/12/default.jsonl
#	all_data_tasks/13/default.jsonl
#	all_data_tasks/14/default.jsonl
#	all_data_tasks/15/default.jsonl
#	all_data_tasks/16/default.jsonl
#	all_data_tasks/17/default.jsonl
#	all_data_tasks/18/default.jsonl
#	all_data_tasks/19/default.jsonl
#	all_data_tasks/2/default.jsonl
#	all_data_tasks/20/default.jsonl
#	all_data_tasks/21/default.jsonl
#	all_data_tasks/22/default.jsonl
#	all_data_tasks/23/default.jsonl
#	all_data_tasks/25/default.jsonl
#	all_data_tasks/26/default.jsonl
#	all_data_tasks/28/default.jsonl
#	all_data_tasks/29/default.jsonl
#	all_data_tasks/3/default.jsonl
#	all_data_tasks/30/default.jsonl
#	all_data_tasks/31/default.jsonl
#	all_data_tasks/32/default.jsonl
#	all_data_tasks/33/default.jsonl
#	all_data_tasks/35/default.jsonl
#	all_data_tasks/36/default.jsonl
#	all_data_tasks/4/default.jsonl
#	all_data_tasks/5/default.jsonl
#	all_data_tasks/6/default.jsonl
#	all_data_tasks/8/default.jsonl
#	all_data_tasks/9/default.jsonl
#	boards_data/da/data_tasks/BitextMining/default.jsonl
#	boards_data/da/data_tasks/Classification/default.jsonl
#	boards_data/en/data_overall/default.jsonl
#	boards_data/en/data_tasks/Classification/default.jsonl
#	boards_data/en/data_tasks/Clustering/default.jsonl
#	boards_data/en/data_tasks/PairClassification/default.jsonl
#	boards_data/en/data_tasks/Reranking/default.jsonl
#	boards_data/en/data_tasks/Retrieval/default.jsonl
#	boards_data/en/data_tasks/STS/default.jsonl
#	boards_data/en/data_tasks/Summarization/default.jsonl
#	boards_data/fr/data_overall/default.jsonl
#	boards_data/fr/data_tasks/Classification/default.jsonl
#	boards_data/fr/data_tasks/Clustering/default.jsonl
#	boards_data/fr/data_tasks/PairClassification/default.jsonl
#	boards_data/fr/data_tasks/Reranking/default.jsonl
#	boards_data/fr/data_tasks/Retrieval/default.jsonl
#	boards_data/fr/data_tasks/STS/default.jsonl
#	boards_data/fr/data_tasks/Summarization/default.jsonl
#	boards_data/law/data_tasks/Retrieval/default.jsonl
#	boards_data/longembed/data_tasks/Retrieval/default.jsonl
#	boards_data/no/data_tasks/Classification/default.jsonl
#	boards_data/other-sts/data_tasks/STS/default.jsonl
#	boards_data/pl/data_overall/default.jsonl
#	boards_data/pl/data_tasks/Classification/default.jsonl
#	boards_data/pl/data_tasks/Clustering/default.jsonl
#	boards_data/pl/data_tasks/PairClassification/default.jsonl
#	boards_data/pl/data_tasks/Retrieval/default.jsonl
#	boards_data/pl/data_tasks/STS/default.jsonl
#	boards_data/rar-b/data_tasks/Retrieval/default.jsonl
#	boards_data/se/data_tasks/Classification/default.jsonl
#	boards_data/zh/data_overall/default.jsonl
#	boards_data/zh/data_tasks/Classification/default.jsonl
#	boards_data/zh/data_tasks/Clustering/default.jsonl
#	boards_data/zh/data_tasks/PairClassification/default.jsonl
#	boards_data/zh/data_tasks/Reranking/default.jsonl
#	boards_data/zh/data_tasks/Retrieval/default.jsonl
#	boards_data/zh/data_tasks/STS/default.jsonl
# Conflicts:
#	EXTERNAL_MODEL_RESULTS.json
#	all_data_tasks/0/default.jsonl
#	all_data_tasks/1/default.jsonl
#	all_data_tasks/10/default.jsonl
#	all_data_tasks/11/default.jsonl
#	all_data_tasks/12/default.jsonl
#	all_data_tasks/13/default.jsonl
#	all_data_tasks/15/default.jsonl
#	all_data_tasks/16/default.jsonl
#	all_data_tasks/17/default.jsonl
#	all_data_tasks/18/default.jsonl
#	all_data_tasks/19/default.jsonl
#	all_data_tasks/2/default.jsonl
#	all_data_tasks/20/default.jsonl
#	all_data_tasks/21/default.jsonl
#	all_data_tasks/22/default.jsonl
#	all_data_tasks/23/default.jsonl
#	all_data_tasks/28/default.jsonl
#	all_data_tasks/29/default.jsonl
#	all_data_tasks/3/default.jsonl
#	all_data_tasks/30/default.jsonl
#	all_data_tasks/31/default.jsonl
#	all_data_tasks/32/default.jsonl
#	all_data_tasks/33/default.jsonl
#	all_data_tasks/34/default.jsonl
#	all_data_tasks/35/default.jsonl
#	all_data_tasks/36/default.jsonl
#	all_data_tasks/4/default.jsonl
#	all_data_tasks/5/default.jsonl
#	all_data_tasks/6/default.jsonl
#	all_data_tasks/8/default.jsonl
#	all_data_tasks/9/default.jsonl
#	boards_data/da/data_tasks/Classification/default.jsonl
#	boards_data/en/data_overall/default.jsonl
#	boards_data/en/data_tasks/Classification/default.jsonl
#	boards_data/en/data_tasks/Clustering/default.jsonl
#	boards_data/en/data_tasks/PairClassification/default.jsonl
#	boards_data/en/data_tasks/Reranking/default.jsonl
#	boards_data/en/data_tasks/Retrieval/default.jsonl
#	boards_data/en/data_tasks/STS/default.jsonl
#	boards_data/en/data_tasks/Summarization/default.jsonl
#	boards_data/fr/data_overall/default.jsonl
#	boards_data/fr/data_tasks/Classification/default.jsonl
#	boards_data/fr/data_tasks/Clustering/default.jsonl
#	boards_data/fr/data_tasks/PairClassification/default.jsonl
#	boards_data/fr/data_tasks/Reranking/default.jsonl
#	boards_data/fr/data_tasks/Retrieval/default.jsonl
#	boards_data/fr/data_tasks/STS/default.jsonl
#	boards_data/fr/data_tasks/Summarization/default.jsonl
#	boards_data/no/data_tasks/Classification/default.jsonl
#	boards_data/other-cls/data_tasks/Classification/default.jsonl
#	boards_data/other-sts/data_tasks/STS/default.jsonl
#	boards_data/pl/data_overall/default.jsonl
#	boards_data/pl/data_tasks/Classification/default.jsonl
#	boards_data/pl/data_tasks/Clustering/default.jsonl
#	boards_data/pl/data_tasks/PairClassification/default.jsonl
#	boards_data/pl/data_tasks/Retrieval/default.jsonl
#	boards_data/pl/data_tasks/STS/default.jsonl
#	boards_data/rar-b/data_tasks/Retrieval/default.jsonl
#	boards_data/se/data_tasks/Classification/default.jsonl
#	boards_data/zh/data_overall/default.jsonl
#	boards_data/zh/data_tasks/Classification/default.jsonl
#	boards_data/zh/data_tasks/Clustering/default.jsonl
#	boards_data/zh/data_tasks/PairClassification/default.jsonl
#	boards_data/zh/data_tasks/Reranking/default.jsonl
#	boards_data/zh/data_tasks/Retrieval/default.jsonl
#	boards_data/zh/data_tasks/STS/default.jsonl
@Samoed
Copy link
Contributor Author

Samoed commented Aug 6, 2024

@KennethEnevoldsen @Muennighoff Can you take a look at PR, please?

@KennethEnevoldsen KennethEnevoldsen changed the title Add rusian models fix: Add Russian models Aug 6, 2024
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor changes otherwise I believe it looks reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no need to change these files (it is done during the CI). Would avoid pushing them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge them after leaderboard update. I was checking how everything was working

@KennethEnevoldsen
Copy link
Contributor

@Muennighoff will you have the time to review this as well to ensure that we don't break the leaderboard?

@Muennighoff Muennighoff merged commit 2461f1b into embeddings-benchmark:main Aug 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants