Haystack API with my own Elasticsearch datastore #2649
-
Hi, following the instructions given in the Haystack API documentation (link) creates 3 docker containers. The Elasticsearch instance that is created is either empty/ filled using deepset's data on countries and capitals. I would like to know how I can connect Haystack API to my own Elasticsearch datastore. Should I edit the docker-compose.yml part on elasticsearch? If so, how should I do it? Or is it possible to follow tutorial 1, where I can create a new document store and initialise the object with my own parameters? And if I do this, where should I then write this code? Thank you for your help. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @nniiggeell the
Let's say you already have an ES cluster running somewhere - you don't need to spin up a container at all, you can comment out this whole section: # elasticsearch:
# # This will start an empty elasticsearch instance (so you have to add your documents yourself)
# #image: "elasticsearch:7.9.2"
# # If you want a demo image instead that is "ready-to-query" with some indexed articles
# # about countries and capital cities from Wikipedia:
# image: "deepset/elasticsearch-countries-and-capitals"
# ports:
# - 9200:9200
# restart: on-failure
# environment:
# - discovery.type=single-node Now you have to tweak the haystack-api:
build:
context: .
dockerfile: Dockerfile
image: "deepset/haystack-cpu:latest"
# Mount custom Pipeline YAML and custom Components.
# volumes:
# - ./rest_api/pipeline:/home/user/rest_api/pipeline
ports:
- 8000:8000
restart: on-failure
environment:
# See rest_api/pipeline/pipelines.haystack-pipeline.yml for configurations of Search & Indexing Pipeline.
# The ES host might be a cloud instance now
- DOCUMENTSTORE_PARAMS_HOST=https://your/cloud/hosted/url/
# We would probably need to pass some sort of authentication
- DOCUMENTSTORE_PARAMS_USERNAME=elasticsearch
- DOCUMENTSTORE_PARAMS_PASSWORD=secret!
- PIPELINE_YAML_PATH=/home/user/rest_api/pipeline/pipelines.haystack-pipeline.yml
- CONCURRENT_REQUEST_PER_WORKER
# we comment the `depends_on` section as we don't depend anymore on the ES container
# depends_on:
# - elasticsearch
# Starts REST API with only 2 workers so that it can be run on systems with just 4GB of memory
# If you need to handle large loads of incoming requests and have memory to spare, consider increasing the number of workers
command: "/bin/bash -c 'sleep 10 && gunicorn rest_api.application:app -b 0.0.0.0 -k uvicorn.workers.UvicornWorker --workers 2 --timeout 180'"
If you followed pedantically the tutorial, you should have an ES container running locally in your computer. The process would be same as above, just you manage the ES container manually and not through Docker Compose and you point Hope this helps, there's a lot to unpack here but let me know if you want to dig further into details. |
Beta Was this translation helpful? Give feedback.
Hi @nniiggeell
the
docker-compose.yml
file is a good starting point but can be customises in multiple ways that totally depend on your use case.Let's say you already have an ES cluster running somewhere - you don't need to spin up a container at all, you can comment out this whole section: