The smallest possible LLM API. Build a question and answer interface to your own content in a few minutes. Uses OpenAI embeddings, gpt-3.5 and Faiss, via Langchain.
- Combine your source documents into a single JSON file called
source.json
. It should look like this:
[
{
"source": "Reference to the source of your content. Typically a title.",
"url": "URL for your source. This key is optional.",
"content": "Your content as a single string. If there's a title or summary, put these first, separated by new lines."
},
...
]
See example.source.json
for an example.
- Install MicroLlama into a virtual environment:
pip install microllama
-
Get an OpenAI API key and add it to the environment, e.g.
export OPENAI_API_KEY=sk-etc
. Note that indexing and querying require OpenAI credits, which aren't free. -
Run your server with
microllama
. If a vector search index doesn't exist, it'll be created from yoursource.json
, and stored. -
Query your documents at /api/ask?your question.
-
Microllama includes an optional web front-end, which is generated with
microllama make-front-end
. This command creates a singleindex.html
file which you can edit. It's served at /.
Microllama is configured through environment variables, with the following defaults:
OPENAI_API_KEY
: requiredFAISS_INDEX_PATH
: "faiss_index"SOURCE_JSON
: "source.json"MAX_RELATED_DOCUMENTS
: "5"EXTRA_CONTEXT
: "Answer in no more than three sentences. If the answer is not included in the context, say 'Sorry, this is no answer for this in my sources.'."UVICORN_HOST
: "0.0.0.0"UVICORN_PORT
: "8080"
Create a Dockerfile with microllama make-dockerfile
. Then:
Sign up for a Fly.io account and install flyctl. Then:
fly launch # answer no to Postgres, Redis and deploying now
fly secrets set OPENAI_API_KEY=sk-etc
fly deploy
gcloud run deploy --source . --set-env-vars="OPENAI_API_KEY=sk-etc"
For Cloud Run and other serverless platforms you should generate the FAISS index
at container build time, to reduce startup time. See the two commented lines in
Dockerfile
.
You can also generate these commands with microllama deploy
.
- Langchain
- Simon Willison's blog post, datasette-openai and datasette-faiss.
- FastAPI
- GPT Index
- Dagster blog post
- Use splitting which generates more meaningful fragments, e.g.
text_splitter =
SpacyTextSplitter(chunk_size=700, chunk_overlap=200, separator=" ")