Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fail to chat with GraphRAG #415

Open
CinderZhang opened this issue Oct 20, 2024 · 9 comments
Open

[BUG] Fail to chat with GraphRAG #415

CinderZhang opened this issue Oct 20, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@CinderZhang
Copy link

Description

Setting up quick upload event
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
User-id: None, can see public conversations: False
User-id: 1, can see public conversations: True
User-id: 1, can see public conversations: True
Session reasoning type None
Session LLM None
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x00000273B5140CA0>, FSPath=WindowsPath('R:/kotaemon-app/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x00000273B5140F40>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EB60>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EF20>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734D420>), mmr=False, rerankers=[CohereReranking(cohere_api_key='<COHERE_API_KEY>', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, FSPath=<theflow.base.unset_ object at 0x00000273FB1E1F60>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x00000273FB1E1F60>, VS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, file_ids=['e6ae8d9e-2419-47bd-b6e2-3607d7f5ced2'], user_id=<theflow.base.unset_ object at 0x00000273FB1E1F60>)]
searching in doc_ids []
Traceback (most recent call last):
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1923, in process_api
result = await self.call_function(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 663, in async_iteration
return await iterator.anext()
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 656, in anext
return await anyio.to_thread.run_sync(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 943, in run
result = context.run(func, *args)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 801, in gen_wrapper
response = next(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\pages\chat_init_.py", line 899, in chat_fn
for response in pipeline.stream(chat_input, conversation_id, chat_history):
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 705, in stream
docs, infos = self.retrieve(message, history)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 503, in retrieve
retriever_docs = retriever_node(text=query)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1097, in call
raise e from None
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1088, in call
output = self.fl.exec(func, args, kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\backends\base.py", line 151, in exec
return run(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 144, in call
raise e from None
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 141, in call
_output = self.next_call(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 117, in call
return self.next_call(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 345, in run
context_builder = self._build_graph_search()
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 204, in _build_graph_search
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 667, in read_parquet
return impl.read(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 267, in read
path_or_handle, handles, filesystem = _get_path_or_handle(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 140, in _get_path_or_handle
handles = get_handle(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\common.py", line 882, in get_handle
handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'R:\kotaemon-app\ktem_app_data\user_data\files\graphrag\a8af56b7-550c-4f92-ba60-fcf2163838b7\output/create_final_nodes.parquet'
User-id: 1, can see public conversations: True

Reproduction steps

1. Go to 'File->GraphRAG'
2. Click on 'Upload'
3. Ask anything in the chat 
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

No response

Browsers

No response

OS

No response

Additional information

I installed it with "....bat" on Windows system.

@CinderZhang CinderZhang added the bug Something isn't working label Oct 20, 2024
@CaMi1le
Copy link

CaMi1le commented Oct 21, 2024

Same problem, already in latest version on linux, using run_linux.sh to install , still facing Graphrag part not working issue

I try modifying the run_linux.sh part like below:
`

if pip list 2>/dev/null | grep -q "kotaemon"; then

    **python -m pip install graphrag future** // new line

    echo "Requirements are already installed"

else

    ..........

`
now gets error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gradio 4.39.0 requires aiofiles<24.0,>=22.0, but you have aiofiles 24.1.0 which is incompatible.
kotaemon 0.7.0 requires tenacity<8.3,>=8.2.3, but you have tenacity 9.0.0 which is incompatible.
langchain 0.2.15 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible.
langchain-community 0.2.11 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible.
langchain-core 0.2.41 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible.
llama-index-core 0.10.68.post1 requires tenacity!=8.4.0,<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible.
llama-index-legacy 0.9.48.post3 requires tenacity<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible.

So it seems to be an env error, I have not yet seen into the pip list, which may find something

@taprosoft
Copy link
Collaborator

Did you set the GraphRAG API key correctly as mentioned in https://github.com/Cinnamon/kotaemon#setup-graphrag?

@piyush-vaghela-solutelabs

I got same error even though I have set the GraphRAG API key in the .env file.

@ajayarunachalam
Copy link

same error persists with setting GraphRAG API key in .env file

@sunnf8888
Copy link

我也有相同的的错误
setting.yaml
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
api_base: http://127.0.0.1:11434/v1
model: llama3.1:8b
model_supports_json: true # recommended if this is available for your model.
request_timeout: 1800.0
concurrent_requests: 5 # the number of parallel inflight requests that may be made

parallelization:
stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
api_base: http://127.0.0.1:11434/v1
model: nomic-embed-text
type: openai_embedding
# api_base: https://.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>

chunks:
size: 1200
overlap: 100
group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\.txt$"

file_pattern: ".*\.txt$"

cache:
type: file # or blob
base_dir: "cache"

storage:
type: file # or blob
base_dir: "output"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

reporting:
type: file # or console, blob
base_dir: "output"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

entity_extraction:

strategy: fully override the entity extraction strategy.

type: one of graph_intelligence, graph_intelligence_json and nltk

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1

summarize_descriptions:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/summarize_descriptions.txt"
max_length: 500

claim_extraction:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1

community_reports:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000

cluster_graph:
max_cluster_size: 10

embed_graph:
enabled: true # if true, will generate node2vec embeddings for nodes

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

umap:
enabled: true # if true, will generate UMAP embeddings for nodes

snapshots:
graphml: true
raw_entities: true
top_level_nodes: true

local_search:

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

global_search:

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

.env

settings for OpenAI

OPENAI_API_BASE=https://api.openai.com/v1

OPENAI_API_BASE=https://api.deepseek.com/v1

OPENAI_API_KEY=

OPENAI_CHAT_MODEL=gpt-3.5-turbo

OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

settings for Azure OpenAI

AZURE_OPENAI_ENDPOINT=

AZURE_OPENAI_API_KEY=

OPENAI_API_VERSION=2024-02-15-preview

AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo

AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002

settings for Cohere

COHERE_API_KEY=<COHERE_API_KEY>

settings for local models

LOCAL_MODEL=llama3.1:8b
LOCAL_MODEL_EMBEDDINGS=nomic-embed-text

settings for GraphRAG

GRAPHRAG_API_KEY=<YOUR_OPENAI_KEY>
GRAPHRAG_LLM_MODEL=llama3.1:8b
GRAPHRAG_EMBEDDING_MODEL=nomic-embed-text

set to true if you want to use customized GraphRAG config file

USE_CUSTOMIZED_GRAPHRAG_SETTING=true

settings for Azure DI

AZURE_DI_ENDPOINT=
AZURE_DI_CREDENTIAL=

settings for Adobe API

get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api

also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"

PDF_SERVICES_CLIENT_ID=
PDF_SERVICES_CLIENT_SECRET=

settings for PDF.js

PDFJS_VERSION_DIST="pdfjs-4.0.379-dist"

base) root@autodl-container-3c3348b04d-889a978b:~# ollama pull nomic-embed-text
pulling manifest
pulling 970aa74c0a90... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 274 MB
pulling c71d239df917... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling ce4a164fc046... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 17 B
pulling 31df23ea7daa... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 420 B
verifying sha256 digest
writing manifest
success

(base) root@autodl-container-3c3348b04d-889a978b:~/autodl-tmp/kotaemon_071# ollama pull llama3.1:8b
pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
success

@sunnf8888
Copy link

output/create_final_nodes.parquet'
same error

@joreyolo
Copy link

FileNotFoundError: [Errno 2] No such file or directory: '/app/ktem_app_data/user_data/files/graphrag/d6d06e52-7acf-4ec6-b1f0-ec84b86fedaa/output/create_final_nodes.parquet'

same error

@sunnf8888
Copy link

我用的服务器是autodl上的服务器。不知道是否和这个有关。

@sunnf8888
Copy link

我用的服务器是autodl上的服务器。不知道是否和这个有关。
FileNotFoundError: [Errno 2] No such file or directory: '/root/autodl-tmp/kotaemon_l/kotaemon/ktem_app_data/user_data/files/graphrag/2d1932f9-2623-406c-b72bnodes.parquet'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants