Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 867 Bytes

README.md

File metadata and controls

24 lines (17 loc) · 867 Bytes

LLM API server mockup

This is a simple fastapi based server mock that implements the OpenAI API.

Available endpoints:

  • /v1/chat/completion

Instead of running a LLM model to generate completions, it simply returns a response generated by surrogate models. Available surrogate models are:

  • "yes_no": returns random "Yes" or "No" response
  • "ja_nein": returns random "Ja" or "Nein" response
  • "lorem_ipsum": returns random "lorem ipsum" text

Run via docker:

docker pull ghcr.io/hummerichsander/llm_api_server_mock:latest
docker run -p 8000:8000 ghcr.io/hummerichsander/llm_api_server_mock:latest

Environment variables:

  • CONTEXT_SIZE: context size for the model (default: 4096)
  • SLEEP_TIME: sleep time in seconds before returning the response (default: 0)
  • MAX_CONCURRENT_REQUESTS: maximum number of concurrent requests (default: 10^9)