batch-gpt

A jump-server to convert openai chat completion api requests to batched chat completion requests

It really is as simple as:

from openai import OpenAI
- client = OpenAI(api_key="sk-...")
+ client = OpenAI(api_key="dummy_openai_api_key", base_url="http://batch-gpt")

Just an example, Batch-GPT works with any OpenAI-compatible client.

Features 🤓👍

Seamless Integration: Drop-in replacement for standard OpenAI API clients
Cost-Effective:
1. ⭐ Up to 50% savings using OpenAI's Batch API
2. Automatic request caching for zero-cost repeat queries
Enhanced Reliability: Resumes processing of interrupted batches on server restart
Persistent Data: MongoDB integration for cross-session data retention
Centralized Management: View all batch statuses at once
Interactive Monitoring: Terminal-based UI tool for real-time batch status monitoring
Flexible Serving Modes:
- Synchronous mode for immediate responses
- Asynchronous mode for handling high-volume requests
Cache-only mode for offline operation without API calls
Secure Key Distribution: Single OpenAI key for all clients, maintained via Batch-GPT

Limitations 🤔

High Turnaround Time: OpenAI's Batch API has a 24-hour SLA (as of 10-10-2024).
Not Suitable for Real-Time: Potential delays make it unsuitable for live requests
Reliability Measures: While implemented, may not fully mitigate long processing times

💡 Consider OpenAI's Realtime API for immediate response needs.

Prerequisites

Go 1.23.0 or later
Docker and (Docker Compose if running MongoDB through local/mongo/docker-compose.yaml)
An OpenAI API key

Setup

You can either build the server from source (for the latest changes) or download pre-compiled binaries.

Option 1: Using Pre-compiled Binaries

Download the latest release for your operating system (darwin/linux/windows) and architecture (amd64/arm64) from the Releases page.
Extract the downloaded archive.

Set up the MongoDB database:

cd local/mongo
docker-compose up -d
cd ../..

Set environment variables:

export OPENAI_API_KEY=your_openai_api_key_here
export COLLATE_BATCHES_FOR_DURATION_IN_MS=5000
export MONGO_HOST=localhost
export MONGO_PORT=27017
export MONGO_USER=admin
export MONGO_PASSWORD=password
export MONGO_DATABASE=batchgpt

Run the server:
```
./batch-gpt
```

Option 2: Building from Source

Clone the repository:

git clone https://github.com/tanmay17061/batch-gpt.git
cd batch-gpt

Set up the MongoDB database:

cd local/mongo
docker-compose up -d
cd ../..

Set environment variables:

export OPENAI_API_KEY=your_openai_api_key_here
export COLLATE_BATCHES_FOR_DURATION_IN_MS=5000
export MONGO_HOST=localhost
export MONGO_PORT=27017
export MONGO_USER=admin
export MONGO_PASSWORD=password
export MONGO_DATABASE=batchgpt

Build and run the server:

go build -o batch-gpt server/main.go
./batch-gpt

The server will start on http://localhost:8080.

Usage

Note: In asynchronous mode, the server will return immediately with a submission confirmation instead of waiting for the actual response. Look at the Advanced Settings section to learn more about sync/async/cache modes.

Sending Chat Completion Requests

You can send requests to the batch-gpt server using any existing openai client.

Using curl

Send POST requests to /v1/chat/completions with the same format as the OpenAI API. For example:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Using OpenAI Python Library

Make sure you have the OpenAI Python library installed:

pip install openai

from openai import OpenAI

# Create a custom OpenAI client that points to the batch-gpt server
client = OpenAI(
    api_key="dummy_openai_api_key",  # The API key is not used by batch-gpt, but is required by the client
    base_url="http://localhost:8080/v1"  # Point to your batch-gpt server
)

# Send a chat completion request
chat_completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

# Print the response
print(chat_completion.choices[0].message.content)

Checking Batch Status

You can check the status of a batch using any existing openai client.

Using curl

To check the status of a specific batch:

curl http://localhost:8080/v1/batches/{your_batch_id_here}

To retrieve the status of all batches:

curl http://localhost:8080/v1/batches

Using OpenAI Python Client

You can use the OpenAI Python client to check batch statuses. Here's an example:

from openai import OpenAI

# Create a custom OpenAI client that points to the batch-gpt server
client = OpenAI(
    api_key="dummy_openai_api_key",  # The API key is not used by batch-gpt, but is required by the client
    base_url="http://localhost:8080/v1"  # Point to your batch-gpt server
)

# Retrieve the status of a specific batch
batch_id = "your_batch_id_here"
batch_status = client.batches.retrieve(batch_id)

# Print the batch status
print(f"Batch ID: {batch_status.batch.id}")
print(f"Status: {batch_status.batch.status}")
print(f"Created At: {batch_status.batch.created_at}")
print(f"Expires At: {batch_status.batch.expires_at}")
print(f"Request Counts: {batch_status.batch.request_counts}")

# Retrieve the status of all batches
all_batches = client.batches.list()

# Print all batch statuses
for batch in all_batches.data:
    print(f"Batch ID: {batch.id}")
    print(f"Status: {batch.status}")
    print(f"Created At: {batch.created_at}")
    print(f"Expires At: {batch.expires_at}")
    print(f"Request Counts: {batch.request_counts}")
    print("---")

Replace "your_batch_id_here" with the actual batch ID you want to check.

This code will connect to your local batch-gpt server and retrieve the status of either a specific batch or all batches. The response will include details such as the batch ID, status, creation time, expiration time, and request counts.

Testing with Python Client

A Python test client is provided in the test-python-client directory.

Install the required Python package:

cd test-python-client
pip install -r requirements.txt

Run the test client:
```
python client.py "Write a joke on Gandalf and Saruman"
```
Note: To effectively utilize batching, run multiple instances of the Python client simultaneously. This simulates concurrent requests, allowing the server to group them into batches for processing.

Environment Variables

The following environment variables can be used to configure the application:

OPENAI_API_KEY: Your OpenAI API key (required)
CLIENT_SERVING_MODE: Set to: "sync"/"async"/"cache"
COLLATE_BATCHES_FOR_DURATION_IN_MS: Duration to collate batches in milliseconds (default: 5000)
COLLECT_BATCH_STATS_POLLING_MAX_INTERVAL_SECONDS: Maximum interval (in seconds) between polling attempts when collecting batch statistics. This value caps the exponential backoff for long-running batches. Default is 300 seconds (5 minutes) if not set.
MONGO_HOST: MongoDB server hostname (default: "localhost")
MONGO_PORT: MongoDB server port (default: "27017")
MONGO_USER: MongoDB username (default: "admin")
MONGO_PASSWORD: MongoDB password (default: "password")
MONGO_DATABASE: MongoDB database name (default: "batchgpt")

Advanced Settings

Fine-tune Batch-GPT's behavior with these advanced configuration options for optimal performance in various scenarios.

Serving Modes

Batch-GPT supports two serving modes:

Synchronous Mode (Default):
- Similar to the standard OpenAI requests, clients remain blocked after making a request to the server.
- Ideal for low-volume scenarios where
- Set CLIENT_SERVING_MODE=sync or leave unset
Asynchronous Mode:
- Returns immediately with a submission confirmation
- Ideal for high-volume scenarios where each worker remaining blocked on a response is not practical
- Set CLIENT_SERVING_MODE=async
Cache-only Mode:
- Cache-only mode allows the server to operate without making new API calls to OpenAI:
- Only serves previously cached responses
- Still processes any dangling batches from previous sessions
- Set CLIENT_SERVING_MODE=cache

To change the serving mode, set the CLIENT_SERVING_MODE environment variable before starting the server.

Batch Monitor

Batch-GPT includes a terminal-based monitoring tool for real-time batch status tracking:

Start the monitor:
```
./batch-monitor
```
Features:
- Real-time status updates for all batches
- Interactive navigation with keyboard shortcuts
- Filtering by batch status (Active/Completed/Failed/Expired)
- Progress tracking for batch requests
- Detailed batch information display

Batch Statistics Polling

The server uses an exponential backoff strategy when polling for batch statistics to reduce unnecessary API calls for long-running batches. The COLLECT_BATCH_STATS_POLLING_MAX_INTERVAL_SECONDS environment variable sets an upper limit on this interval.

For example:

export COLLECT_BATCH_STATS_POLLING_MAX_INTERVAL_SECONDS=600

This would set the maximum polling interval to 10 minutes. The actual polling interval starts smaller and increases exponentially up to this maximum value.

Contributing

Contributions are welcome! Please read our Contributing Guidelines for details on how to add new features, submit pull requests, and work with the codebase.

Acknowledgements

This project makes extensive use of the go-openai library, which provides a Go client for the OpenAI API. We are grateful to the maintainers and contributors of go-openai for their excellent work, which has significantly simplified our interaction with OpenAI's services.

The go-openai library is used throughout this project for:

Defining request and response structures
Handling API interactions with OpenAI
Implementing batch processing functionality

We encourage users and contributors to this project to also check out and support the go-openai library.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
cmd/monitor		cmd/monitor
local/mongo		local/mongo
server		server
services		services
test-python-client		test-python-client
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

batch-gpt

Features 🤓👍

Limitations 🤔

Prerequisites

Setup

Option 1: Using Pre-compiled Binaries

Option 2: Building from Source

Usage

Sending Chat Completion Requests

Using curl

Using OpenAI Python Library

Checking Batch Status

Using curl

Using OpenAI Python Client

Testing with Python Client

Environment Variables

Advanced Settings

Serving Modes

Batch Monitor

Batch Statistics Polling

Contributing

Acknowledgements

License

Contributing

About

Releases 11

Packages

Languages

License

tanmay17061/batch-gpt

Folders and files

Latest commit

History

Repository files navigation

batch-gpt

Features 🤓👍

Limitations 🤔

Prerequisites

Setup

Option 1: Using Pre-compiled Binaries

Option 2: Building from Source

Usage

Sending Chat Completion Requests

Using curl

Using OpenAI Python Library

Checking Batch Status

Using curl

Using OpenAI Python Client

Testing with Python Client

Environment Variables

Advanced Settings

Serving Modes

Batch Monitor

Batch Statistics Polling

Contributing

Acknowledgements

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Languages

Packages