-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with Vector Databases #1
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe recent changes introduce advanced vector handling and querying capabilities to Changes
Poem
Tip Early access features: enabledWe are currently testing the following features in early access:
Note:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (8)
chroma_db/104d460b-aaa5-4746-969c-b131149e52a7/data_level0.bin
is excluded by!**/*.bin
chroma_db/104d460b-aaa5-4746-969c-b131149e52a7/header.bin
is excluded by!**/*.bin
chroma_db/104d460b-aaa5-4746-969c-b131149e52a7/length.bin
is excluded by!**/*.bin
chroma_db/104d460b-aaa5-4746-969c-b131149e52a7/link_lists.bin
is excluded by!**/*.bin
chroma_db/7f8bd9ff-1cf4-4944-81ab-e7c257a0268c/data_level0.bin
is excluded by!**/*.bin
chroma_db/7f8bd9ff-1cf4-4944-81ab-e7c257a0268c/header.bin
is excluded by!**/*.bin
chroma_db/7f8bd9ff-1cf4-4944-81ab-e7c257a0268c/length.bin
is excluded by!**/*.bin
chroma_db/7f8bd9ff-1cf4-4944-81ab-e7c257a0268c/link_lists.bin
is excluded by!**/*.bin
Files selected for processing (4)
- SimplerLLM/language/llm.py (3 hunks)
- SimplerLLM/tools/vector_db.py (1 hunks)
- new.py (1 hunks)
- requirements.txt (1 hunks)
Files skipped from review due to trivial changes (1)
- requirements.txt
Additional context used
Ruff
SimplerLLM/tools/vector_db.py
1-1:
os
imported but unusedRemove unused import:
os
(F401)
new.py
3-3:
os
imported but unusedRemove unused import:
os
(F401)
SimplerLLM/language/llm.py
1-1:
os
imported but unusedRemove unused import:
os
(F401)
2-2:
dotenv.load_dotenv
imported but unusedRemove unused import:
dotenv.load_dotenv
(F401)
4-4:
SimplerLLM.language.llm_providers.openai_llm.generate_response
imported but unusedRemove unused import:
SimplerLLM.language.llm_providers.openai_llm.generate_response
(F401)
5-5:
SimplerLLM.language.llm_providers.openai_llm.generate_response_async
imported but unusedRemove unused import:
SimplerLLM.language.llm_providers.openai_llm.generate_response_async
(F401)
116-116: Undefined name
openai_llm
(F821)
155-155: Undefined name
openai_llm
(F821)
215-215: Undefined name
gemini_llm
(F821)
252-252: Undefined name
gemini_llm
(F821)
301-301: Undefined name
anthropic_llm
(F821)
339-339: Undefined name
anthropic_llm
(F821)
Additional comments not posted (4)
SimplerLLM/tools/vector_db.py (2)
25-26
: Review ofquery_similar
method.This method is a simple wrapper around
query_vectors
, which is a good use of code reuse. Ensure that the behavior ofquery_vectors
is as intended, as any change will affect this method too.
15-16
: Review ofstore_vectors
method.The method correctly stores vectors with unique IDs. However, consider handling potential exceptions that might arise from database operations to improve robustness.
new.py (1)
59-64
: Review ofmain
function and script entry point.The
main
function and the script's entry point are well-defined. It's good practice to include such entry points in scripts intended for direct execution.SimplerLLM/language/llm.py (1)
58-62
: Review of vector storage and querying methods.The methods
store_response_as_vector
andfind_similar_responses
are straightforward and utilize theVectorDB
class effectively. Ensure that the underlying methods inVectorDB
handle errors appropriately.
def __init__(self): | ||
persistence_directory = "./chroma_db" | ||
self.client = chromadb.PersistentClient(path=persistence_directory) | ||
self.embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2") | ||
self.collection = self.client.get_or_create_collection( | ||
name="responses", | ||
embedding_function=self.embedding_function | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of VectorDB
class constructor.
The constructor initializes the PersistentClient
and sets up a collection with an embedding function. The hard-coded path for the database ("./chroma_db"
) could be made configurable to enhance flexibility in different environments.
- persistence_directory = "./chroma_db"
+ def __init__(self, persistence_directory="./chroma_db"):
+ self.client = chromadb.PersistentClient(path=persistence_directory)
Committable suggestion was skipped due to low confidence.
def store_response(self, text): | ||
self.collection.add(documents=[text], ids=[f"id_{self.collection.count()}"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of store_response
method.
The method adds a single document to the collection. Using the collection's count as an ID is risky as it can lead to race conditions in a concurrent environment. Consider using a more robust method for generating unique IDs.
- ids=[f"id_{self.collection.count()}"]
+ import uuid
+ ids=[str(uuid.uuid4())]
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def store_response(self, text): | |
self.collection.add(documents=[text], ids=[f"id_{self.collection.count()}"]) | |
def store_response(self, text): | |
import uuid | |
self.collection.add(documents=[text], ids=[str(uuid.uuid4())]) |
self.vector_db = VectorDB() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of LLM
class constructor.
The constructor initializes a VectorDB
instance. It's good practice to allow dependency injection for better testing and flexibility.
- self.vector_db = VectorDB()
+ def __init__(self, vector_db=None):
+ self.vector_db = vector_db if vector_db else VectorDB()
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.vector_db = VectorDB() | |
def __init__(self, vector_db=None): | |
self.vector_db = vector_db if vector_db else VectorDB() |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- new.py (1 hunks)
Additional context used
Ruff
new.py
3-3:
os
imported but unusedRemove unused import:
os
(F401)
Additional comments not posted (1)
new.py (1)
62-67
: LGTM!The
main
function is straightforward and does not require changes.
print("\nQuerying for similar responses:") | ||
for query_prompt in query_prompts: | ||
print(f"\nQuery: {query_prompt}") | ||
start_time = time.time() | ||
similar_responses = llm.find_similar_responses(query_prompt) | ||
end_time = time.time() | ||
print(f"Time taken: {end_time - start_time:.2f} seconds") | ||
print("Similar responses:") | ||
for i, response in enumerate(similar_responses, 1): | ||
print(f"{i}. {response}") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling to the querying process.
The loop for querying similar responses is clear and straightforward. However, adding error handling would improve the robustness of the test.
+ try:
similar_responses = llm.find_similar_responses(query_prompt)
+ except Exception as e:
+ print("Error occurred:", e)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
print("\nQuerying for similar responses:") | |
for query_prompt in query_prompts: | |
print(f"\nQuery: {query_prompt}") | |
start_time = time.time() | |
similar_responses = llm.find_similar_responses(query_prompt) | |
end_time = time.time() | |
print(f"Time taken: {end_time - start_time:.2f} seconds") | |
print("Similar responses:") | |
for i, response in enumerate(similar_responses, 1): | |
print(f"{i}. {response}") | |
print("\nQuerying for similar responses:") | |
for query_prompt in query_prompts: | |
print(f"\nQuery: {query_prompt}") | |
start_time = time.time() | |
try: | |
similar_responses = llm.find_similar_responses(query_prompt) | |
except Exception as e: | |
print("Error occurred:", e) | |
continue | |
end_time = time.time() | |
print(f"Time taken: {end_time - start_time:.2f} seconds") | |
print("Similar responses:") | |
for i, response in enumerate(similar_responses, 1): | |
print(f"{i}. {response}") |
I have added functions to integrate words into Vector bases. I have utilized chroma Database which is using all-MiniLM-L6-v2 model from the Sentence Transformers library.
In SimpleLLm/tools/vector_db.py , I have added code as follows :
Then in SimplerLLM/language/llm.py , the following modifications were added,in addition to existing code, In order to invoke the Execution of the Vector databases
Initialized instance of an class
Then
The Below given Libraries are required to be Installed
pip install chromadb sentence-transformers
Finally in requirements.txt, gave the correct versions
You can test this working by executing following Sample code
Summary by CodeRabbit
New Features
Enhancements
Dependencies
sentence-transformers
andchromadb
to the project dependencies.