Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Warning about Mismatch Between similarity function of Embedding Model and Index space_type #2356

Open
YeonghyeonKO opened this issue Dec 26, 2024 · 0 comments

Comments

@YeonghyeonKO
Copy link

Is your feature request related to a problem?

  • There can be a problem when embedding vectors(ex. msmarco-distilbert-base-tas-b; say it's similarity function is cosine similarity) are indexed if we map the knn_vector field with a different space_type. (ex. L2)
  • The distance calculated from the embedding model's weights and the vector distance from a HNSW Graph can differ, leading to inaccurate search scores.
  • This means that since OpenSearch stores HNSW Graph structures of each segment created by Faiss/NMSLIB/Lucene, search results from the graph could vary depending on the space_type.

What solution would you like?

  • Are there any benefits to using different space_type values with the similarity function of embedding models?
  • I suggest displaying warning messages in the above scenario to alert users to potential inaccuracies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant