Text Embeddings Inference (TEI)
is a comprehensive toolkit designed for efficient deployment and serving of open source text embeddings models.
It enable us to host our own reranker endpoint seamlessly.
This README provides set-up instructions and comprehensive details regarding the reranking microservice via TEI.
To start the Reranking microservice, you must first install the required python packages.
pip install -r requirements.txt
export HF_TOKEN=${your_hf_api_token}
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export volume=$PWD/data
docker run -d -p 6060:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $RERANK_MODEL_ID --hf-api-token $HF_TOKEN
curl 127.0.0.1:6060/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
export TEI_RERANKING_ENDPOINT="http://${your_ip}:6060"
python reranking_tei_xeon.py
If you start an Reranking microservice with docker, the docker_compose_reranking.yaml
file will automatically start a TEI service with docker.
export HF_TOKEN=${your_hf_api_token}
export TEI_RERANKING_ENDPOINT="http://${your_ip}:8808"
cd ../../../
docker build -t opea/reranking-tei:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/reranks/tei/Dockerfile .
To start a docker container, you have two options:
- A. Run Docker with CLI
- B. Run Docker with Docker Compose
You can choose one as needed.
docker run -d --name="reranking-tei-server" -p 8000:8000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TEI_RERANKING_ENDPOINT=$TEI_RERANKING_ENDPOINT -e HF_TOKEN=$HF_TOKEN opea/reranking-tei:latest
docker compose -f docker_compose_reranking.yaml up -d
The Reranking microservice exposes following API endpoints:
-
Check Service Status
curl http://localhost:8000/v1/health_check \ -X GET \ -H 'Content-Type: application/json'
-
Execute reranking process by providing query and documents
curl http://localhost:8000/v1/reranking \ -X POST \ -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ -H 'Content-Type: application/json'
- You can add the parameter
top_n
to specify the return number of the reranker model, default value is 1.
curl http://localhost:8000/v1/reranking \ -X POST \ -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}], "top_n":2}' \ -H 'Content-Type: application/json'
- You can add the parameter