Abnormally low values for NanoBEIR benchmark #1627

minsik-ai · 2024-12-24T12:03:15Z

Continuing from #1588

NanoBEIR performance on Touche2020 and NFCorpus is too low compared to reported values.

You can check out some of the values here: embeddings-benchmark/results#72

isaac-chung · 2024-12-24T16:13:22Z

@minsik-ai could you please specify:

which model you tried, the script and/or commands you used,
the corresponding results file in that PR you linked, and
what values (metrics) you're comparing

Thanks in advance!

Samoed · 2024-12-24T17:43:53Z

The original blog only presents results for e5-mistral based models, and it's hard to evaluate because we don't know which prompts were used during testing. I think @ArthurCamara might be able to share some insights on how they evaluated models on NanoBEIR.

Samoed · 2024-12-24T20:12:37Z

I've evaluated multilingual-e5-small on mteb NanoBEIR and sentence transformers. Code. Here scores is ndcg@10

Task	MTEB	Sentece Transformers
NanoArguAna	0.44536	0.444486
NanoClimateFever	0.2222	0.30642
NanoDBPedia	0.17534	0.6053
NanoFever	0.80845	0.30642
NanoFiQA2018	0.34363	0.4430
NanoHotpotQA	0.56911	0.81012
NanoMSMARCO	0.62091	0.62091
NanoNFCorpus	0.05535	0.2885
NanoNQ	0.67664	0.68618
NanoQuora	0.90621	0.97279
NanoSCIDOCS	0.20826	0.34377
NanoSciFact	0.71129	0.72457
NanoTouche2020	0.19598	0.49540

Not matching results:

NanoArguAna
NanoClimateFever
NanoDBPedia
NanoFever
NanoFiQA2018
NanoHotpotQA
NanoNFCorpus
NanoQuora
NanoSCIDOCS
NanoTouche2020

Matching results:

NanoArguAna
NanoMSMARCO
NanoSciFact (diff 0.01)

minsik-ai · 2024-12-25T01:45:12Z

@Samoed 's findings is the main difference I've seen!
You can see NFCorpus is at 0.05 range for MTEB, compared to Sentence Transformers which have 0.2 range.
I've also run additional experiments with intfloat/e5-mistral-7b-instruct and have seen similar performance degradation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormally low values for NanoBEIR benchmark #1627

Abnormally low values for NanoBEIR benchmark #1627

minsik-ai commented Dec 24, 2024

isaac-chung commented Dec 24, 2024

Samoed commented Dec 24, 2024 •

edited

Loading

Samoed commented Dec 24, 2024 •

edited

Loading

minsik-ai commented Dec 25, 2024

Abnormally low values for NanoBEIR benchmark #1627

Abnormally low values for NanoBEIR benchmark #1627

Comments

minsik-ai commented Dec 24, 2024

isaac-chung commented Dec 24, 2024

Samoed commented Dec 24, 2024 • edited Loading

Samoed commented Dec 24, 2024 • edited Loading

minsik-ai commented Dec 25, 2024

Samoed commented Dec 24, 2024 •

edited

Loading

Samoed commented Dec 24, 2024 •

edited

Loading