Haystack API with my own Elasticsearch datastore #2649

nniiggeell · 2022-06-09T12:56:47Z

nniiggeell
Jun 9, 2022

Hi, following the instructions given in the Haystack API documentation (link) creates 3 docker containers. The Elasticsearch instance that is created is either empty/ filled using deepset's data on countries and capitals.

I would like to know how I can connect Haystack API to my own Elasticsearch datastore. Should I edit the docker-compose.yml part on elasticsearch? If so, how should I do it?

Or is it possible to follow tutorial 1, where I can create a new document store and initialise the object with my own parameters? And if I do this, where should I then write this code?

Thank you for your help.

Answered by masci

Jun 10, 2022

Hi @nniiggeell

the docker-compose.yml file is a good starting point but can be customises in multiple ways that totally depend on your use case.

I would like to know how I can connect Haystack API to my own Elasticsearch datastore.

Let's say you already have an ES cluster running somewhere - you don't need to spin up a container at all, you can comment out this whole section:

  # elasticsearch:
  #   # This will start an empty elasticsearch instance (so you have to add your documents yourself)
  #   #image: "elasticsearch:7.9.2"
  #   # If you want a demo image instead that is "ready-to-query" with some indexed articles
  #   # about countries and capital cities from Wikipedia:
  #   im…

View full answer

masci · 2022-06-10T09:55:58Z

masci
Jun 10, 2022

Hi @nniiggeell

the docker-compose.yml file is a good starting point but can be customises in multiple ways that totally depend on your use case.

I would like to know how I can connect Haystack API to my own Elasticsearch datastore.

Let's say you already have an ES cluster running somewhere - you don't need to spin up a container at all, you can comment out this whole section:

  # elasticsearch:
  #   # This will start an empty elasticsearch instance (so you have to add your documents yourself)
  #   #image: "elasticsearch:7.9.2"
  #   # If you want a demo image instead that is "ready-to-query" with some indexed articles
  #   # about countries and capital cities from Wikipedia:
  #   image: "deepset/elasticsearch-countries-and-capitals"
  #   ports:
  #     - 9200:9200
  #   restart: on-failure
  #   environment:
  #     - discovery.type=single-node

Now you have to tweak the haystack-api container so that it points to your existing ES instance:

  haystack-api:
    build:
      context: .
      dockerfile: Dockerfile
    image: "deepset/haystack-cpu:latest"
    # Mount custom Pipeline YAML and custom Components.
    # volumes:
    #   - ./rest_api/pipeline:/home/user/rest_api/pipeline
    ports:
      - 8000:8000
    restart: on-failure
    environment:
      # See rest_api/pipeline/pipelines.haystack-pipeline.yml for configurations of Search & Indexing Pipeline.
      # The ES host might be a cloud instance now
      - DOCUMENTSTORE_PARAMS_HOST=https://your/cloud/hosted/url/
      # We would probably need to pass some sort of authentication
      - DOCUMENTSTORE_PARAMS_USERNAME=elasticsearch
      - DOCUMENTSTORE_PARAMS_PASSWORD=secret!
      - PIPELINE_YAML_PATH=/home/user/rest_api/pipeline/pipelines.haystack-pipeline.yml
      - CONCURRENT_REQUEST_PER_WORKER
    # we comment the `depends_on` section as we don't depend anymore on the ES container
    # depends_on:
    #   - elasticsearch
    # Starts REST API with only 2 workers so that it can be run on systems with just 4GB of memory
    # If you need to handle large loads of incoming requests and have memory to spare, consider increasing the number of workers
    command: "/bin/bash -c 'sleep 10 && gunicorn rest_api.application:app -b 0.0.0.0 -k uvicorn.workers.UvicornWorker --workers 2 --timeout 180'"

Or is it possible to follow tutorial 1, where I can create a new document store and initialise the object with my own parameters?

If you followed pedantically the tutorial, you should have an ES container running locally in your computer. The process would be same as above, just you manage the ES container manually and not through Docker Compose and you point DOCUMENTSTORE_PARAMS_HOST to localhost.

Hope this helps, there's a lot to unpack here but let me know if you want to dig further into details.

2 replies

nniiggeell Jun 13, 2022
Author

Hi @masci , thanks for your helpful reply.

Can I check with you: if I want to query from a specific index from my own elasticsearch instance, should I include a line DOCUMENTSTORE_PARAMS_INDEX=[specific index name] inside the environment definition for haystack_api?

I am not too sure where this DOCUMENTSTORE_PARAMS_HOST / USERNAME/ PASSWORD is being used? I can't seem to find how this variable works.

masci Jun 13, 2022

@nniiggeell correct, indeed that detail is buried in the pipeline API docs (look for load_from_yaml) and we should document how to use those variables, I had to read the code to find out. I opened an issue to keep track of this.

Those variables map the yaml file you would use to define a pipeline, e.g. if you have

components:
  - name: DocumentStore
    type: ElasticsearchDocumentStore
    params:
      host: localhost
      index: haystack_test

you could override (or define if missing) the index field by setting the env var DOCUMENTSTORE_PARAMS_INDEX=haystack_test_2. Similarly, DOCUMENTSTORE_PARAMS_USERNAME=myuser would be the same as loading this yaml:

components:
  - name: DocumentStore
    type: ElasticsearchDocumentStore
    params:
      username: myuser

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Haystack API with my own Elasticsearch datastore #2649

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Haystack API with my own Elasticsearch datastore #2649

nniiggeell Jun 9, 2022

Replies: 1 comment · 2 replies

masci Jun 10, 2022

nniiggeell Jun 13, 2022 Author

masci Jun 13, 2022

nniiggeell
Jun 9, 2022

Replies: 1 comment 2 replies

masci
Jun 10, 2022

nniiggeell Jun 13, 2022
Author