[BUG] Fail to chat with GraphRAG #415

CinderZhang · 2024-10-20T23:50:44Z

Description

Setting up quick upload event
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
User-id: None, can see public conversations: False
User-id: 1, can see public conversations: True
User-id: 1, can see public conversations: True
Session reasoning type None
Session LLM None
Reasoning class <class 'ktem.reasoning.simple.FullQAPipeline'>
Reasoning state {'app': {'regen': False}, 'pipeline': {}}
Thinking ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x00000273B5140CA0>, FSPath=WindowsPath('R:/kotaemon-app/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x00000273B5140F40>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EB60>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734EF20>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x00000273B734D420>), mmr=False, rerankers=[CohereReranking(cohere_api_key='<COHERE_API_KEY>', model_name='rerank-multilingual-v2.0')], retrieval_mode='hybrid', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, FSPath=<theflow.base.unset_ object at 0x00000273FB1E1F60>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x00000273FB1E1F60>, VS=<theflow.base.unset_ object at 0x00000273FB1E1F60>, file_ids=['e6ae8d9e-2419-47bd-b6e2-3607d7f5ced2'], user_id=<theflow.base.unset_ object at 0x00000273FB1E1F60>)]
searching in doc_ids []
Traceback (most recent call last):
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\queueing.py", line 575, in process_events
response = await route_utils.call_process_api(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1923, in process_api
result = await self.call_function(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 663, in async_iteration
return await iterator.anext()
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 656, in anext
return await anyio.to_thread.run_sync(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "R:\kotaemon-app\install_dir\env\lib\site-packages\anyio_backends_asyncio.py", line 943, in run
result = context.run(func, *args)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\gradio\utils.py", line 801, in gen_wrapper
response = next(iterator)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\pages\chat_init_.py", line 899, in chat_fn
for response in pipeline.stream(chat_input, conversation_id, chat_history):
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 705, in stream
docs, infos = self.retrieve(message, history)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\reasoning\simple.py", line 503, in retrieve
retriever_docs = retriever_node(text=query)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1097, in call
raise e from None
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1088, in call
output = self.fl.exec(func, args, kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\backends\base.py", line 151, in exec
return run(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 144, in call
raise e from None
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 141, in call
_output = self.next_call(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\middleware.py", line 117, in call
return self.next_call(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\theflow\base.py", line 1017, in _runx
return self.run(*args, **kwargs)
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 345, in run
context_builder = self._build_graph_search()
File "R:\kotaemon-app\install_dir\env\lib\site-packages\ktem\index\file\graph\pipelines.py", line 204, in _build_graph_search
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 667, in read_parquet
return impl.read(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 267, in read
path_or_handle, handles, filesystem = _get_path_or_handle(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\parquet.py", line 140, in _get_path_or_handle
handles = get_handle(
File "R:\kotaemon-app\install_dir\env\lib\site-packages\pandas\io\common.py", line 882, in get_handle
handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'R:\kotaemon-app\ktem_app_data\user_data\files\graphrag\a8af56b7-550c-4f92-ba60-fcf2163838b7\output/create_final_nodes.parquet'
User-id: 1, can see public conversations: True

Reproduction steps

1. Go to 'File->GraphRAG'
2. Click on 'Upload'
3. Ask anything in the chat 
4. See error

Screenshots

![DESCRIPTION](LINK.png)

Logs

No response

Browsers

No response

OS

No response

Additional information

I installed it with "....bat" on Windows system.

The text was updated successfully, but these errors were encountered:

CaMi1le · 2024-10-21T01:46:18Z

Same problem, already in latest version on linux, using run_linux.sh to install , still facing Graphrag part not working issue

I try modifying the run_linux.sh part like below:
`

if pip list 2>/dev/null | grep -q "kotaemon"; then

    **python -m pip install graphrag future** // new line

    echo "Requirements are already installed"

else

    ..........

`
now gets error:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gradio 4.39.0 requires aiofiles<24.0,>=22.0, but you have aiofiles 24.1.0 which is incompatible.
kotaemon 0.7.0 requires tenacity<8.3,>=8.2.3, but you have tenacity 9.0.0 which is incompatible.
langchain 0.2.15 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible.
langchain-community 0.2.11 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible.
langchain-core 0.2.41 requires tenacity!=8.4.0,<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible.
llama-index-core 0.10.68.post1 requires tenacity!=8.4.0,<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible.
llama-index-legacy 0.9.48.post3 requires tenacity<9.0.0,>=8.2.0, but you have tenacity 9.0.0 which is incompatible.

So it seems to be an env error, I have not yet seen into the pip list, which may find something

taprosoft · 2024-10-21T06:36:36Z

Did you set the GraphRAG API key correctly as mentioned in https://github.com/Cinnamon/kotaemon#setup-graphrag?

piyush-vaghela-solutelabs · 2024-10-21T09:24:04Z

I got same error even though I have set the GraphRAG API key in the .env file.

ajayarunachalam · 2024-10-21T15:27:09Z

same error persists with setting GraphRAG API key in .env file

sunnf8888 · 2024-10-21T15:49:52Z

我也有相同的的错误
setting.yaml
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
api_base: http://127.0.0.1:11434/v1
model: llama3.1:8b
model_supports_json: true # recommended if this is available for your model.
request_timeout: 1800.0
concurrent_requests: 5 # the number of parallel inflight requests that may be made

parallelization:
stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
api_base: http://127.0.0.1:11434/v1
model: nomic-embed-text
type: openai_embedding
# api_base: https://.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>

chunks:
size: 1200
overlap: 100
group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\.txt$"

file_pattern: ".*\.txt$"

cache:
type: file # or blob
base_dir: "cache"

storage:
type: file # or blob
base_dir: "output"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

reporting:
type: file # or console, blob
base_dir: "output"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

entity_extraction:

strategy: fully override the entity extraction strategy.

type: one of graph_intelligence, graph_intelligence_json and nltk

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1

summarize_descriptions:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/summarize_descriptions.txt"
max_length: 500

claim_extraction:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1

community_reports:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000

cluster_graph:
max_cluster_size: 10

embed_graph:
enabled: true # if true, will generate node2vec embeddings for nodes

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

umap:
enabled: true # if true, will generate UMAP embeddings for nodes

snapshots:
graphml: true
raw_entities: true
top_level_nodes: true

local_search:

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

global_search:

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

.env

settings for OpenAI

OPENAI_API_BASE=https://api.openai.com/v1

OPENAI_API_BASE=https://api.deepseek.com/v1

OPENAI_API_KEY=

OPENAI_CHAT_MODEL=gpt-3.5-turbo

OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

settings for Azure OpenAI

AZURE_OPENAI_ENDPOINT=

AZURE_OPENAI_API_KEY=

OPENAI_API_VERSION=2024-02-15-preview

AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo

AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002

settings for Cohere

COHERE_API_KEY=<COHERE_API_KEY>

settings for local models

LOCAL_MODEL=llama3.1:8b
LOCAL_MODEL_EMBEDDINGS=nomic-embed-text

settings for GraphRAG

GRAPHRAG_API_KEY=<YOUR_OPENAI_KEY>
GRAPHRAG_LLM_MODEL=llama3.1:8b
GRAPHRAG_EMBEDDING_MODEL=nomic-embed-text

set to true if you want to use customized GraphRAG config file

USE_CUSTOMIZED_GRAPHRAG_SETTING=true

settings for Azure DI

AZURE_DI_ENDPOINT=
AZURE_DI_CREDENTIAL=

settings for Adobe API

get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api

also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"

PDF_SERVICES_CLIENT_ID=
PDF_SERVICES_CLIENT_SECRET=

settings for PDF.js

PDFJS_VERSION_DIST="pdfjs-4.0.379-dist"

base) root@autodl-container-3c3348b04d-889a978b:~# ollama pull nomic-embed-text
pulling manifest
pulling 970aa74c0a90... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 274 MB
pulling c71d239df917... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling ce4a164fc046... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 17 B
pulling 31df23ea7daa... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 420 B
verifying sha256 digest
writing manifest
success

(base) root@autodl-container-3c3348b04d-889a978b:~/autodl-tmp/kotaemon_071# ollama pull llama3.1:8b
pulling manifest
pulling 8eeb52dfb3bb... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 948af2743fc7... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 1a4c3c319823... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
success

sunnf8888 · 2024-10-23T14:00:15Z

output/create_final_nodes.parquet'
same error

joreyolo · 2024-10-24T00:32:04Z

FileNotFoundError: [Errno 2] No such file or directory: '/app/ktem_app_data/user_data/files/graphrag/d6d06e52-7acf-4ec6-b1f0-ec84b86fedaa/output/create_final_nodes.parquet'

same error

sunnf8888 · 2024-10-24T04:26:35Z

我用的服务器是autodl上的服务器。不知道是否和这个有关。

sunnf8888 · 2024-10-24T05:11:49Z

我用的服务器是autodl上的服务器。不知道是否和这个有关。
FileNotFoundError: [Errno 2] No such file or directory: '/root/autodl-tmp/kotaemon_l/kotaemon/ktem_app_data/user_data/files/graphrag/2d1932f9-2623-406c-b72bnodes.parquet'

CinderZhang added the bug Something isn't working label Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fail to chat with GraphRAG #415

[BUG] Fail to chat with GraphRAG #415

CinderZhang commented Oct 20, 2024

CaMi1le commented Oct 21, 2024 •

edited

Loading

taprosoft commented Oct 21, 2024

piyush-vaghela-solutelabs commented Oct 21, 2024

ajayarunachalam commented Oct 21, 2024

sunnf8888 commented Oct 21, 2024

sunnf8888 commented Oct 23, 2024

joreyolo commented Oct 24, 2024

sunnf8888 commented Oct 24, 2024

sunnf8888 commented Oct 24, 2024

[BUG] Fail to chat with GraphRAG #415

[BUG] Fail to chat with GraphRAG #415

Comments

CinderZhang commented Oct 20, 2024

Description

Reproduction steps

Screenshots

Logs

Browsers

OS

Additional information

CaMi1le commented Oct 21, 2024 • edited Loading

taprosoft commented Oct 21, 2024

piyush-vaghela-solutelabs commented Oct 21, 2024

ajayarunachalam commented Oct 21, 2024

sunnf8888 commented Oct 21, 2024

num_threads: 50 # the number of threads to use for parallel processing

file_pattern: ".*\.txt$"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

strategy: fully override the entity extraction strategy.

type: one of graph_intelligence, graph_intelligence_json and nltk

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

settings for OpenAI

OPENAI_API_BASE=https://api.openai.com/v1

OPENAI_API_BASE=https://api.deepseek.com/v1

OPENAI_API_KEY=

OPENAI_CHAT_MODEL=gpt-3.5-turbo

OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002

settings for Azure OpenAI

AZURE_OPENAI_ENDPOINT=

AZURE_OPENAI_API_KEY=

OPENAI_API_VERSION=2024-02-15-preview

AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo

AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002

settings for Cohere

settings for local models

settings for GraphRAG

set to true if you want to use customized GraphRAG config file

settings for Azure DI

settings for Adobe API

get free credential at https://acrobatservices.adobe.com/dc-integration-creation-app-cdn/main.html?api=pdf-extract-api

also install pip install "pdfservices-sdk@git+https://github.com/niallcm/pdfservices-python-sdk.git@bump-and-unfreeze-requirements"

settings for PDF.js

CaMi1le commented Oct 21, 2024 •

edited

Loading