Update LangChain Support #2188

Skar0 · 2024-10-16T23:39:52Z

What does this PR do?

WIP, fixes #2187

Skar0 · 2024-10-18T20:00:32Z

I just realized that the RunnablePassthrough() in the provided code sample from the documentation is not correct, as it results in the whole input (document key + question key) being passed through to the keywords key and thus the prompt contains the full input dictionary where {question} appears in the prompt.

This sample code (slightly modified from the example in the documentation)

from bertopic import BERTopic
from bertopic.representation import LangChain

from langchain.chains.question_answering import load_qa_chain
from langchain_core.documents import Document
from langchain_core.runnables import RunnablePassthrough

representation_llm = ...

representation_prompt = "summarize these documents, here are keywords about them [KEYWORDS]"

chain = (
        {
            "input_documents": (
                lambda inp: [
                    Document(
                        page_content=d.page_content.upper()
                    )
                    for d in inp["input_documents"]
                ]
            ),
            "question": RunnablePassthrough(),
        }
        | load_qa_chain(representation_llm, chain_type="stuff")
        | (lambda output: {"output_text": output["output_text"]})
)

representation_model = LangChain(chain, prompt=representation_prompt, nr_docs=2)

docs = [
    "The sky is blue and the sun is shining.",
    "I love going to the beach on sunny days.",
    "Artificial intelligence is transforming the world.",
    "Machine learning enables computers to learn from data.",
    "It's important to wear sunscreen to avoid sunburns.",
    "Deep learning models require a lot of data and computation.",
    "Today's weather forecast predicts a clear sky.",
    "Neural networks are powerful models in AI.",
    "I need to buy a new pair of sunglasses for summer.",
    "Natural language processing is a subset of AI."
]

topic_model = BERTopic(representation_model=representation_model)

topics, probabilities = topic_model.fit_transform(docs)

results in this prompt being created

================================ System Message ================================

Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
DEEP LEARNING MODELS REQUIRE A LOT OF DATA AND COMPUTATION.

THE SKY IS BLUE AND THE SUN IS SHINING.
================================ Human Message =================================

{'input_documents': [Document(metadata={}, page_content='Deep learning models require a lot of data and computation.'), Document(metadata={}, page_content='The sky is blue and the sun is shining.')], 'question': 'summarize these documents, here are keywords about them to, is, the, of, learning, ai, data, models, and, sky, networks, neural, new, sunny, wear, on, pair, transforming, powerful, sunscreen, require, sunglasses, sunburns, todays, predicts, processing, subset, sun, summer, shining'}

I'll fix this as well in the PR 😄

Skar0 added 2 commits October 17, 2024 01:37

update langchain keys and examples

608d34c

update langchain docs for representation

8fed376

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update LangChain Support #2188

Update LangChain Support #2188

Skar0 commented Oct 16, 2024

Skar0 commented Oct 18, 2024

Update LangChain Support #2188

Are you sure you want to change the base?

Update LangChain Support #2188

Conversation

Skar0 commented Oct 16, 2024

What does this PR do?

Skar0 commented Oct 18, 2024