Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update LangChain Support #2188

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

Skar0
Copy link

@Skar0 Skar0 commented Oct 16, 2024

What does this PR do?

WIP, fixes #2187

@Skar0
Copy link
Author

Skar0 commented Oct 18, 2024

I just realized that the RunnablePassthrough() in the provided code sample from the documentation is not correct, as it results in the whole input (document key + question key) being passed through to the keywords key and thus the prompt contains the full input dictionary where {question} appears in the prompt.

This sample code (slightly modified from the example in the documentation)

from bertopic import BERTopic
from bertopic.representation import LangChain

from langchain.chains.question_answering import load_qa_chain
from langchain_core.documents import Document
from langchain_core.runnables import RunnablePassthrough

representation_llm = ...

representation_prompt = "summarize these documents, here are keywords about them [KEYWORDS]"

chain = (
        {
            "input_documents": (
                lambda inp: [
                    Document(
                        page_content=d.page_content.upper()
                    )
                    for d in inp["input_documents"]
                ]
            ),
            "question": RunnablePassthrough(),
        }
        | load_qa_chain(representation_llm, chain_type="stuff")
        | (lambda output: {"output_text": output["output_text"]})
)

representation_model = LangChain(chain, prompt=representation_prompt, nr_docs=2)

docs = [
    "The sky is blue and the sun is shining.",
    "I love going to the beach on sunny days.",
    "Artificial intelligence is transforming the world.",
    "Machine learning enables computers to learn from data.",
    "It's important to wear sunscreen to avoid sunburns.",
    "Deep learning models require a lot of data and computation.",
    "Today's weather forecast predicts a clear sky.",
    "Neural networks are powerful models in AI.",
    "I need to buy a new pair of sunglasses for summer.",
    "Natural language processing is a subset of AI."
]

topic_model = BERTopic(representation_model=representation_model)

topics, probabilities = topic_model.fit_transform(docs)

results in this prompt being created

================================ System Message ================================

Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
DEEP LEARNING MODELS REQUIRE A LOT OF DATA AND COMPUTATION.

THE SKY IS BLUE AND THE SUN IS SHINING.
================================ Human Message =================================

{'input_documents': [Document(metadata={}, page_content='Deep learning models require a lot of data and computation.'), Document(metadata={}, page_content='The sky is blue and the sun is shining.')], 'question': 'summarize these documents, here are keywords about them to, is, the, of, learning, ai, data, models, and, sky, networks, neural, new, sunny, wear, on, pair, transforming, powerful, sunscreen, require, sunglasses, sunburns, todays, predicts, processing, subset, sun, summer, shining'}

I'll fix this as well in the PR 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update LangChain Support
1 participant