-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add GraphDBBlock to LongtermAgentMemory and add GraphDBMemory #851
base: master
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrsbeep for the PR. There may be still something to be redeign for this PR.
An example on how to use Neo4jGraph
for GraphRAG here:
https://colab.research.google.com/drive/1meBf9w8KzZvQdQU2I1bCyOg9ehoGDK1u#scrollTo=l4VANToobUzu
A couple things to be discussed:
- What is the default structure of the
LongtermAgentMemory
? Should we always have bothVectorDBBlock
andGraphDBBlock
or allow different combinations? - Should we allow user to load a pre-built knowledge graph?
- Should we allow agent to automatically convert chat messages into knowledge graphs?
- How to score, retrieve and decide whether to use the retrieved results?
- How to combine to retrieved KGs with
vector_db_retrieve
andchat_history
?
""" | ||
|
||
def __init__(self, storage: Optional[BaseGraphStorage] = None) -> None: | ||
self.storage = storage or self.default_graph_storage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Neo4jGraph
as the default storage?
class Neo4jGraph(BaseGraphStorage): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm currently using BaseGraphStorage as storage in GraphDBBlock. I will make it so that Neo4jGraph can use it too.
camel/memories/agent_memories.py
Outdated
self, | ||
context_creator: BaseContextCreator, | ||
storage: Optional[BaseGraphStorage] = None, | ||
query: str = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use self._current_topic
similar to VectorDBMemory
for the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here haven't been update
results = self.storage.query(query, params) | ||
score = 1.0 | ||
return [ | ||
ContextRecord( | ||
memory_record=MemoryRecord.from_dict(result), score=score | ||
) | ||
for result in results | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this run?
graph database. | ||
""" | ||
results = self.storage.query(query, params) | ||
score = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score
should not be 1.0
for all ContextRecord
What is the default structure of the LongtermAgentMemory? Should we always have both VectorDBBlock and GraphDBBlock or allow different combinations? Should we allow user to load a pre-built knowledge graph? Should we allow agent to automatically convert chat messages into knowledge graphs? How to score, retrieve and decide whether to use the retrieved results? How to Combine Retrieved Knowledge Graphs with VectorDBRetrieve and Chat History? |
…memory' of https://github.com/camel-ai/camel into feature/add-graphdb-memory
…-ai/camel into feature/add-graphdb-memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrsbeep 's great contribution! Left some comments below~
def default_graph_storage( | ||
self, | ||
url: Optional[str] = None, | ||
username: Optional[str] = None, | ||
password: Optional[str] = None, | ||
database: Optional[str] = "neo4j", | ||
) -> BaseGraphStorage: | ||
# Try to get details from environment variables first | ||
url = os.getenv('NEO4J_URI', url) | ||
username = os.getenv('NEO4J_USERNAME', username) | ||
password = os.getenv('NEO4J_PASSWORD', password) | ||
database = os.getenv('NEO4J_DATABASE', database) | ||
|
||
# Log an error if any of the necessary values are missing | ||
if not url or not username or not password: | ||
logger.error( | ||
"Neo4j connection detailed are missing." | ||
"Ensure environment variables or parameters are set for" | ||
"NEO4J_URI, NEO4J_USERNAME, and NEO4J_PASSWORD." | ||
) | ||
raise ValueError("Missing Neo4j connection details.") | ||
|
||
# Create and return an instance of Neo4jGraph | ||
return Neo4jGraph(url, username, password, database or "neo4j") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can just pass config information to Neo4jGraph
directly since the environment variable fetching and checking would be done from class Neo4jGraph
's side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has already been corrected.
def _calculate_scores(self, query: str, results: List[Any]) -> List[float]: | ||
query_graph = self._build_graph(query) | ||
scores = [] | ||
for result in results: | ||
result_graph = self._build_graph(result) | ||
score = self._graph_similarity(query_graph, result_graph) | ||
scores.append(score) | ||
return scores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the score I think using an embedding-based vector would be more effective for calculating the score, as it aligns better with matching the user's query. A semantic-based score would be more suitable in this case. WDYT?
else: | ||
subj = content.get('subject') | ||
obj = content.get('object') | ||
rel = content.get('relation') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record.message.content
should be str
, for what case it will go to this part?
""" | ||
self.storage.add_triplet(subj, obj, rel) | ||
|
||
def delete_triplet(self, subj: str, obj: str, rel: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the delete_triplet
shouldn't be put into here, it will be managed from graph db's side, but we can add delete_records
method
camel/memories/agent_memories.py
Outdated
query (str, optional): | ||
The query to retrieve data from the graph database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query (str, optional): | |
The query to retrieve data from the graph database. | |
query (str, optional): The query to retrieve data from the graph | |
database. |
def clear(self) -> None: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also add code to realize this method if it's required by the memory module
camel/memories/agent_memories.py
Outdated
def write_triplet(self, subj: str, obj: str, rel: str) -> None: | ||
r"""Writes a triplet to the graph database. | ||
|
||
Args: | ||
subj (str): The subject of the triplet. | ||
obj (str): The object of the triplet. | ||
rel (str): The relationship between subject and object. | ||
""" | ||
self._graphdb_block.write_triplet(subj, obj, rel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here should be write_records
using self._graphdb_block.write_record()
camel/memories/agent_memories.py
Outdated
self, | ||
context_creator: BaseContextCreator, | ||
storage: Optional[BaseGraphStorage] = None, | ||
query: str = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here haven't been update
results = self.storage.query(query, params) | ||
scores = self._calculate_scores(query, results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here the query is some query language like Cypher, calculate the score between the query and the matched result from graph database seems doesn't make sense
…-ai/camel into feature/add-graphdb-memory
window_size (int, optional): The number of recent chat messages to | ||
retrieve. If not provided, the entire chat history will be | ||
retrieved. (default: :obj:`None`) | ||
. Args: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean .
here
storage (BaseGraphStorage, optional): A graph storage backend. If | ||
`None`, a default implementation will be used. (default: `None`) | ||
query (str, optional): The query to retrieve data from the graph | ||
database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage (BaseGraphStorage, optional): A graph storage backend. If | |
`None`, a default implementation will be used. (default: `None`) | |
query (str, optional): The query to retrieve data from the graph | |
database. | |
storage (BaseGraphStorage, optional): A graph storage backend. If | |
:obj:`None`, a default implementation will be used. (default: | |
:obj:`None`) | |
query (str, optional): The query to retrieve data from the graph | |
database. |
) -> None: | ||
self._context_creator = context_creator | ||
self._graphdb_block = GraphDBBlock(storage=storage) | ||
self.query_embedding = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems this is defined but not used
@@ -130,7 +177,13 @@ def retrieve(self) -> List[ContextRecord]: | |||
vector_db_retrieve = self.vector_db_block.retrieve( | |||
self._current_topic, self.retrieve_limit | |||
) | |||
return chat_history[:1] + vector_db_retrieve + chat_history[1:] | |||
graph_db_retrieve = self.graph_db_block.retrieve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here will retrieve all messages stored in graph db, why not limit it to specific topic like how vector_db_retrieve
did?
storage (Optional[BaseGraphStorage], optional): The storage mechanism | ||
for the graph database. Defaults to a specific implementation | ||
if not provided. (default: None) | ||
embedding (Optional[BaseEmbedding], optional): Embedding mechanism | ||
toconvert text into vector representations. Defaults to | ||
OpenAIEmbedding if not provided. (default: None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage (Optional[BaseGraphStorage], optional): The storage mechanism | |
for the graph database. Defaults to a specific implementation | |
if not provided. (default: None) | |
embedding (Optional[BaseEmbedding], optional): Embedding mechanism | |
toconvert text into vector representations. Defaults to | |
OpenAIEmbedding if not provided. (default: None) | |
storage (BaseGraphStorage, optional): The storage mechanism | |
for the graph database. Defaults to :obj:`Neo4jGraph` | |
if not provided. | |
embedding (BaseEmbedding, optional): Embedding mechanism | |
toconvert text into vector representations. Defaults to | |
:obj:`OpenAIEmbedding`. |
url: Optional[str] = None, | ||
username: Optional[str] = None, | ||
password: Optional[str] = None, | ||
database: Optional[str] = "neo4j", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these information should be passed to BaseGraphStorage
level rather than here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @mrsbeep! The code is thoughtfully designed, but I believe we need to clarify the problem we're solving by using Graph Memory. In my opinion, Graph Memory should be used to enhance the representation of relationships between different entities, rather than just retrieving the messages themselves. At the moment, some of your design seems tailored more toward message sequencing
results = self.storage.query( | ||
""" | ||
MATCH (n:Message) | ||
WHERE exists(n.embedding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this query requires embedding exits, it would be better we can create embedding before this query by running CREATE VECTOR INDEX
""" | ||
MATCH (n:Message) | ||
WHERE exists(n.embedding) | ||
WITH n, gds.similarity.cosine( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use db.index.vector.queryNodes
instead, since gds need additional dependency
r"""Retrieves records from the graph database based on query embedding. | ||
|
||
Args: | ||
query_embedding (Optional[List[float]]): The embedding vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can support the query input as topic or key word, then do the embedding internally
query_embedding (Optional[List[float]]): The embedding vector | ||
for similarity search. | ||
limit (int): The maximum number of records to retrieve. | ||
Defaults to 10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the docstring format for default value
for similarity search. | ||
limit (int): The maximum number of records to retrieve. | ||
Defaults to 10. | ||
order_by (str): The property to order by in the default query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use numberOfNearestNeighbours
would be better
return [] | ||
|
||
def write_records(self, records: List[MemoryRecord]) -> None: | ||
"""Writes recordes to the graph database.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Writes recordes to the graph database.""" | |
r"""Writes recordes to the graph database.""" |
|
||
properties = flatten_and_convert(properties) | ||
|
||
content = record.message.content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we add the message as one node to graph database, I feel it would be better if we can extract the entity from this message using KnowledgeGraphAgent
, then add entity to the graph database. Since the relationship between entities in a message makes more value than the relationship between 2 messages. WDYT?
self.storage.delete_triplet( | ||
subj=content, obj=content, rel="NEXT" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, use entity from message content rather than content it self would be better? BTW why rel="NEXT"
here, how it used?
def add_edge( | ||
self, | ||
start_node_id: str, | ||
end_node_id: str, | ||
relationship_type: str, | ||
properties: Dict[str, Any] | None = None, | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we add this as we already have add_triplet
?
def _add_relationship_to_previous(self, current_node_uuid: str) -> None: | ||
if self.previous_node_id is None: | ||
return | ||
|
||
query = ( | ||
"MATCH (prev {uuid: $prev_uuid}), (current {uuid: $current_uuid}) " | ||
"CREATE (prev)-[:NEXT]->(current)" | ||
) | ||
|
||
try: | ||
with self.driver.session(database=self.database) as session: | ||
session.run( | ||
query, | ||
{ | ||
"prev_uuid": self.previous_node_id, | ||
"current_uuid": current_node_uuid, | ||
}, | ||
) | ||
logger.info( | ||
f"Created relationship from {self.previous_node_id} " | ||
f"to {current_node_uuid}." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this relationship is specially designed for message sequence, but I think we use KG is to get relationship between entities of the message content, rather than just retrieve linked message back, which is no much difference with ChatHistoryBlock
Description
This pull request introduces a new feature to the LongtermAgentMemory by integrating a GraphDBBlock and adding a new GraphDBMemory. The changes aim to leverage knowledge graphs within AgentMemory, enhancing the capability to store and retrieve structured information through graph databases.
Motivation and Context
close #846
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!