📄 GPT-4ALL Document Query Chatbot

This project enables users to query a wide variety of documents using an advanced chatbot powered by open-source LLMs like GPT-4ALL and Llama. Leveraging embeddings, vector databases, and data loaders, this system efficiently handles document parsing, storage, and retrieval.

📌 Project Overview

In today’s data-intensive environments, there’s a growing need to convert unstructured data into actionable insights. This chatbot bridges that gap by allowing users to interactively query documents, with support for multiple formats including PDF, Word, PowerPoint, Markdown, and more.

Built with langchain and chromadb, this solution processes documents by:

Converting them into text chunks.
Embedding these chunks as vectors.
Storing them for easy retrieval, powered by a selected LLM model.

🚀 Features

Multi-format Document Support: Accepts documents in .pdf, .docx, .pptx, .txt, and other formats.
Embeddings with Langchain: Uses HuggingFaceBgeEmbeddings for text chunk embeddings.
Vector Storage with ChromaDB: Stores text embeddings as vectors for efficient retrieval.
Choice of LLMs: Supports GPT-4ALL and Llama models for answering queries.
Customizable Environment: Easily configure model and embedding options via .env.

📂 Project Structure

requirements.txt: Lists necessary Python packages.
.env: Contains environment variables for model and database settings.
constants.py: Holds constants for Chroma database configuration.
ingest.py: Processes and stores documents as vectors for future querying.
privateGPT.py: Main chatbot script for querying stored documents.

🛠 Installation

Clone the repository:

git clone https://github.com/MoAshour93/Construction_Private_GPT.git
cd Construction_Private_GPT

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

Create a .env file in the root directory, using the provided template:

PERSIST_DIRECTORY=db
MODEL_TYPE=GPT4All
MODEL_PATH=models/ggml-gpt4all-j-v1.3-groovy.bin
EMBEDDINGS_MODEL_NAME=all-MiniLM-L6-v2
MODEL_N_CTX=1000
MODEL_N_BATCH=8
TARGET_SOURCE_CHUNKS=4

🚀 Usage

1. Ingest Documents

Use ingest.py to process and store document embeddings:

python ingest.py

2. Run the Chatbot

Start querying documents using privateGPT.py:

python privateGPT.py

Enter your query at the prompt.
Type exit to end the session.

🔧 Customizable Options

Use --hide-source or -S to hide source documents used in responses.
Use --mute-stream or -M to disable streaming output from the LLM.

🔗 General Links & Resources

Our Website: www.apcmasterypath.co.uk
APC Mastery Path Blogposts: APC Blogposts
LinkedIn Pages: Personal | APC Mastery Path

⚙️ Configuration

Constants: The constants.py file includes important settings for the ChromaDB database.
Environment Variables: Set customizable parameters in .env, including model path and embedding model name.

🗂️ Supported Document Formats

Format	Loader
PDF	`PyPDFLoader`
Word Documents	`UnstructuredWordDocumentLoader`
PowerPoint	`UnstructuredPowerPointLoader`
Markdown	`UnstructuredMarkdownLoader`
CSV	`CSVLoader`
Text	`TextLoader`

📈 Limitations & Next Steps

This initial implementation is a command-line-based chatbot, but it can be extended:

GUI Integration: Integrate with Streamlit or Chainlit for a graphical user interface.
Multi-agent Architecture: Develop task-specific agents for more complex queries.
Broader LLM Support: Experiment with other open-source models from Hugging Face.

📄 License

This project is licensed under the Apache 2.0 License.

📞 Support

For any questions, feel free to contact Mohamed Ashour.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
source_documents		source_documents
(rename to .env after download).txt		(rename to .env after download).txt
LICENSE		LICENSE
PrivateGPT step-by-step guide_compressed.pdf		PrivateGPT step-by-step guide_compressed.pdf
README.md		README.md
constants.py		constants.py
ingest.py		ingest.py
privateGPT.py		privateGPT.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 GPT-4ALL Document Query Chatbot

📑 Table of Contents

📌 Project Overview

🚀 Features

📂 Project Structure

🛠 Installation

🚀 Usage

1. Ingest Documents

2. Run the Chatbot

🔧 Customizable Options

🔗 General Links & Resources

⚙️ Configuration

🗂️ Supported Document Formats

📈 Limitations & Next Steps

📄 License

📞 Support

About

Releases

Packages

Languages

License

MoAshour93/Construction_Private_GPT

Folders and files

Latest commit

History

Repository files navigation

📄 GPT-4ALL Document Query Chatbot

📑 Table of Contents

📌 Project Overview

🚀 Features

📂 Project Structure

🛠 Installation

🚀 Usage

1. Ingest Documents

2. Run the Chatbot

🔧 Customizable Options

🔗 General Links & Resources

⚙️ Configuration

🗂️ Supported Document Formats

📈 Limitations & Next Steps

📄 License

📞 Support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages