You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The changes introduced in PR #39 make impossible for one to set dimensions or normalize parameters in the model_kwargs of BedrockEmbeddings class.
Example below:
from langchain_aws import BedrockEmbeddings
# The options set here are overwritten
model_kwargs = {
"dimensions": 256,
"normalize": False
}
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0",
region_name="eu-central-1", model_kwargs=model_kwargs)
text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
embedded = embeddings.embed_documents(texts=[text])
print(embedded[0]) # Vector is normalized because of the norm = True default in embed_documents()
print(len(embedded[0])) # Prints a length of 1024 because of the dim = 1024 default in embed_documents()
This is problematic when using the BedrockEmbeddings with a vector store:
from langchain_aws import BedrockEmbeddings
from langchain_postgres.vectorstores import PGVector
import os
DB_USER = os.getenv('PGUSER')
DB_PASSWORD = os.getenv('PGPASSWORD')
DB_HOST = os.getenv('PGHOST')
DB_PORT = os.getenv('PGPORT')
DB_NAME = os.getenv('PGDATABASE')
model_kwargs = {
"dimensions": 256,
"normalize": False
}
connection = f"postgresql+psycopg://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"
text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0",
region_name="eu-central-1", model_kwargs=model_kwargs)
pgvector_store = PGVector(
connection=connection, embeddings=embeddings)
pgvector_store.add_texts(texts=[text]) # add_texts() calls embed_documents() and model_kwargs are overwritten again
Expected behavior is model_kwargs not to be overwritten by any method unless explicit values are provided.
EDIT: Even if the embeddings are created separately from the vector store e.g:
When the chain is invoked with the vector store retriever it throws an error for different dimension vectors since it creates a query vector with the default 1024 dimensions:
[SQL: SELECT langchain_pg_embedding.id AS langchain_pg_embedding_id, langchain_pg_embedding.collection_id AS langchain_pg_embedding_collection_id, langchain_pg_embedding.embedding AS langchain_pg_embedding_embedding, langchain_pg_embedding.document AS langchain_pg_embedding_document, langchain_pg_embedding.cmetadata AS langchain_pg_embedding_cmetadata, langchain_pg_embedding.embedding <=> %(embedding_1)s AS distance
sqlalchemy.exc.DataError: (psycopg.errors.DataException) different vector dimensions 512 and 1024
The text was updated successfully, but these errors were encountered:
@trendafil-gechev
Thanks for opening this issue. The earlier PR seems to have deviated from the base Embeddings interface. You are right, we should not have allowed adding these as defaults directly in the embed methods, rather applied them through the model_kwargs or directly via the class.
@bigbernnn
Is there ever need for different embedding dimensions from within a single embeddings class?
The changes introduced in PR #39 make impossible for one to set
dimensions
ornormalize
parameters in themodel_kwargs
ofBedrockEmbeddings
class.Example below:
This is problematic when using the
BedrockEmbeddings
with a vector store:Expected behavior is
model_kwargs
not to be overwritten by any method unless explicit values are provided.EDIT: Even if the embeddings are created separately from the vector store e.g:
When the chain is invoked with the vector store
retriever
it throws an error for different dimension vectors since it creates a query vector with the default 1024 dimensions:The text was updated successfully, but these errors were encountered: