Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load message history to core chat from schema chat #429

Merged
merged 6 commits into from
Jun 13, 2024

Conversation

blakerosenthal
Copy link
Contributor

Basic prereq for passing chat history down the pipeline. History is already present in the schema.Chat object loaded from the database, so we just need to make sure it also gets loaded into the ragna.core.Chat object inside schema_to_core_chat.

with get_session() as session:
chat = database.get_chat(session, user=user, id=id)
chat.messages.append(
schemas.Message(content=prompt, role=ragna.core.MessageRole.USER)
)
core_chat = schema_to_core_chat(session, user=user, chat=chat)
core_answer = await core_chat.answer(prompt, stream=stream)

ragna/deploy/_api/schemas.py Outdated Show resolved Hide resolved
ragna/deploy/_api/core.py Outdated Show resolved Hide resolved
Comment on lines 69 to 74
def to_core(self) -> ragna.core.Message:
return ragna.core.Message(
content=self.content,
role=self.role,
sources=[],
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmeier Tell me if this makes sense:
For the purpose of chat history, we don't need Sources for each individual message since they already exist on the chat object. So we don't need to rebuild them here (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since they already exist on the chat object

They don't. Source's are returned by the SourceStorage and are never saved on the Chat but only ever on the Message

ragna/ragna/core/_rag.py

Lines 221 to 227 in 8ea62a3

sources = await self._run(self.source_storage.retrieve, self.documents, prompt)
answer = Message(
content=self._run_gen(self.assistant.answer, prompt, sources),
role=MessageRole.ASSISTANT,
sources=sources,
)

If you have a look at the ORM model for the chat, we also have no connection to the sources table, but only the messages

class Chat(Base):
__tablename__ = "chats"
id = Column(types.Uuid, primary_key=True) # type: ignore[attr-defined]
user_id = Column(ForeignKey("users.id"))
name = Column(types.String, nullable=False)
documents = relationship(
"Document",
secondary=document_chat_association_table,
back_populates="chats",
)
source_storage = Column(types.String, nullable=False)
assistant = Column(types.String, nullable=False)
params = Column(Json, nullable=False)
messages = relationship(
"Message", cascade="all, delete", order_by="Message.timestamp"
)
prepared = Column(types.Boolean, nullable=False)

The messages table however is connected to the sources table

source_message_association_table = Table(
"source_message_association_table",
Base.metadata,
Column("source_id", ForeignKey("sources.id"), primary_key=True),
Column("message_id", ForeignKey("messages.id"), primary_key=True),
)

Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after 8a984a6 is reverted as explained in #429 (comment).

This reverts commit 8a984a6.
@blakerosenthal
Copy link
Contributor Author

@pmeier Mypy points out that you can't construct a ragna.core.Document from a schema.Document because it's missing the metadata field. I see a couple ways around that:

  1. Add a metadata field to schema.Document, although it seems like that might interact weirdly with the multi-document code that's strewn around that keeps the documents and metadata separate. It would create some redundancies to clean up, but I think it could be nice to just have metadata on the schema object.
  2. Cascade all the object conversions directly in schema_to_core_chat kind of like the functions found in database.py.

I'm leaning toward option 1 unless there's a simpler way.

@pmeier
Copy link
Member

pmeier commented Jun 13, 2024

Ah yeah, I'm hitting the same on a another branch as well. I agree we should go with option 1 especially in the light of #256.

No need to duplicate work here. Just set metadata={} and add a comment that this will be resolved later.

@blakerosenthal blakerosenthal merged commit 45d5f94 into Quansight:main Jun 13, 2024
21 checks passed
blakerosenthal added a commit that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants