-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate performance discrepancies in gte-Qwen and NV-embed models #1600
Comments
Hello, I conducted a comparison of the models using the examples provided in the
In these cases, the official implementation of Transformers AutoModel differs from the official sentence_transformers implementation, which is unexpected. The implementation in mteb aligns completely with sentence_transformers. I also wanted to share the code I used for this comparison: View the Gist Please note that questions regarding the correctness of prompt usage were not within the scope of this study. However, it does highlight that the models added to mteb are correctly implemented. P.S. |
Qwen model repository includes a script to calculate scores for their models on the MTEB benchmark. I ran this script on the same tasks covered in my pull request. The results from the original script are, in most cases, worse than those reported on the leaderboard and also fall short when compared to results obtained using the code from the MTEB models.
Additionally, there is a open discussion about this on the Qwen model repository. Classification
Clustering
PairClassification
Reranking
Retrieval
STS
Summarization
|
Right from this is seems like we should update the scores on the leaderboard with the new reproducible scores. Since the authors has been made aware (issue on NVIDIA and on QWEN) I believe this is a fair decision to make. @AlexeyVatolin have you run the models, otherwise I will ask Niklas to rerun them |
I'm a member of the gte-Qwen series model. Sorry, we checked and found some errors in the previous script. It have now been updated and verified to be consistent with the results on the leaderboard. Please try again with the latest script to check the results. |
@afalf, thanks a lot! I've run the gte-Qwen models with the updated script and will post as soon as I have results |
From #1436
The text was updated successfully, but these errors were encountered: