Unable to reproduce the metrics of BAAI/bge-reranker-large on the T2Reranking dataset. #1581

HuDi2018 · 2024-12-12T04:46:46Z

my reproduce script:

import mteb
from sentence_transformers import SentenceTransformer

model_name = "BAAI/bge-reranker-large"
result_folder_name = "T2Reranking"

model = SentenceTransformer(model_name, cache_folder="/mnt/ckpt/")
tasks = mteb.get_tasks(tasks=["T2Reranking"])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"work_dirs/{result_folder_name}", save_predictions=True, encode_kwargs={"batch_size":128})

result json:

{
  "dataset_revision": "76631901a18387f85eaa53e5450019b87ad58ef9",
  "evaluation_time": 836.0624091625214,
  "kg_co2_emissions": null,
  "mteb_version": "1.19.4",
  "scores": {
    "dev": [
      {
        "hf_subset": "default",
        "languages": [
          "cmn-Hans"
        ],
        "main_score": 0.5106703213152956,
        **"map": 0.5106703213152956,**
        "mrr": 0.583640624496244,
        "nAUC_map_diff1": -0.04473865428840341,
        "nAUC_map_max": 0.042846196592508184,
        "nAUC_map_std": -0.043888763247088756,
        "nAUC_mrr_diff1": -0.07990475962222543,
        "nAUC_mrr_max": 0.07057809005533859,
        "nAUC_mrr_std": -0.06398509636797499
      }
    ]
  },
  "task_name": "T2Reranking"
}

the same issue on model repo(seems all rerank model have this problem):
bce-rerank: netease-youdao/BCEmbedding#98
bge-rerank: FlagOpen/FlagEmbedding#1285

The text was updated successfully, but these errors were encountered:

KennethEnevoldsen · 2024-12-12T05:05:07Z

Seems like, from the Huggingface implementation, that they normalize their embeddings (shouldn't cause that causes the difference, but nice to rule out).

I believe these were the scores you were expecting?

You can also try to run https://huggingface.co/intfloat/multilingual-e5-small to rule out a bug in the dataset implementation (I would be surprised as recent results seem to obtain similar scores >65)

(I also recommend opening an issue on this on their model page and linking it here)

HuDi2018 · 2024-12-12T05:44:22Z

Seems like, from the Huggingface implementation, that they normalize their embeddings (shouldn't cause that causes the difference, but nice to rule out).

I believe these were the scores you were expecting?

You can also try to run https://huggingface.co/intfloat/multilingual-e5-small to rule out a bug in the dataset implementation (I would be surprised as recent results seem to obtain similar scores >65)

(I also recommend opening an issue on this on their model page and linking it here)

thanks for reply
I tried lier007/xiaobu-embedding-v2(embedding model) on t2reranking, and the results seem to be correct.
suspect there might be an issue with the implementation of the rerank model?
Anyway, I will try your suggestions later to see if it can solve the problem.

KennethEnevoldsen · 2024-12-12T06:20:40Z

Perfect - Let me know how it goes

shenlei1020 · 2024-12-13T04:57:21Z

modify for CORSS ENCODER evaluation:
https://github.com/netease-youdao/BCEmbedding/blob/c6327ccc854d65e8d4eb2edac74dbb6eb67733ec/BCEmbedding/evaluation/c_mteb/Reranking.py#L9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce the metrics of BAAI/bge-reranker-large on the T2Reranking dataset. #1581

Unable to reproduce the metrics of BAAI/bge-reranker-large on the T2Reranking dataset. #1581

HuDi2018 commented Dec 12, 2024 •

edited

Loading

KennethEnevoldsen commented Dec 12, 2024 •

edited

Loading

HuDi2018 commented Dec 12, 2024 •

edited by KennethEnevoldsen

Loading

KennethEnevoldsen commented Dec 12, 2024

shenlei1020 commented Dec 13, 2024

Unable to reproduce the metrics of BAAI/bge-reranker-large on the T2Reranking dataset. #1581

Unable to reproduce the metrics of BAAI/bge-reranker-large on the T2Reranking dataset. #1581

Comments

HuDi2018 commented Dec 12, 2024 • edited Loading

KennethEnevoldsen commented Dec 12, 2024 • edited Loading

HuDi2018 commented Dec 12, 2024 • edited by KennethEnevoldsen Loading

KennethEnevoldsen commented Dec 12, 2024

shenlei1020 commented Dec 13, 2024

HuDi2018 commented Dec 12, 2024 •

edited

Loading

KennethEnevoldsen commented Dec 12, 2024 •

edited

Loading

HuDi2018 commented Dec 12, 2024 •

edited by KennethEnevoldsen

Loading