From 1ed773862bb381797d9570ceb3dade7156dda745 Mon Sep 17 00:00:00 2001 From: khituras Date: Thu, 29 Dec 2022 14:56:37 +0100 Subject: [PATCH] Add documentation. --- README.md | 91 ++--------------- engine/.mvn/maven-settings.xml | 24 +++++ engine/Dockerfile | 3 +- engine/README.md | 98 +++++++++++++++++++ .../src/main/resources/application.properties | 2 +- 5 files changed, 133 insertions(+), 85 deletions(-) create mode 100644 engine/.mvn/maven-settings.xml create mode 100644 engine/README.md diff --git a/README.md b/README.md index 65cfb63..b542a42 100644 --- a/README.md +++ b/README.md @@ -2,94 +2,21 @@ This search engine was developed in the context of the [SMITH](https://www.smith.care/de/) project. It aims to specifically offer search capabilities for medical text in German. Its main feature is the seamless integration of semantic concepts, called named entities, into the search index. This allows to search for canonical IDs of diseases, medications or possibly other entity types. Thus, instead of providing the search engine with synonyms and writing variants of the same concept in order to retrieve as much relevant documents as possible, these steps are handled in the preprocessing step and woven into the index. The engine offers faceting and highlighting capabilities that work with normal text queries as well as entity IDs. Entity IDs and normal words can be used in arbitrary combinations since the entity IDs are just words from the perspective of the search index. -The code for the search engine consists of two parts, the indexing pipeline and this Web application. The indexing pipeline, that is used to read documents, detect entities and create index documents, is found in this Git repository in the `smithsearch-indexing-pipeline` directory. - -This application is built with [Spring Boot](https://spring.io/projects/spring-boot) and relies on [ElasticSearch](https://www.elastic.co/) for its search capabilities. The indexing pipeline is built with [UIMA](https://uima.apache.org/) using [JCoRe](https://github.com/JULIELab/jcore-base) components. - -An ElasticSearch instance can be quickly provided using Docker. See the directory `../es-docker` for instructions. - -## Quickstart - -The quickest way to start up the pipeline application is to use the official Docker image like this: -``` -docker run --rm -p 8080:8080 -v julielab/smithsearch:1.0.0 -``` - -Then, the web service will be available at `http://localhost:8080/search`. - -## Web service Usage - -The Web service offers a REST interface to the `/search` endpoint. Search requests are sent there using the `POST` HTTP method. The request body must be a JSON object in the following format: - +Consider this example request: ```json -{ - "query": "...", - "from": 0, - "size": 10, - "doHighlighting": true, - "doFaceting": true -} +curl -XPOST http://localhost:8080/search -H 'Content-Type: application/json' -d '{"query":"R05","from":0,"size":1,"doHighlighting":true}' ``` -Where -* `query` is an ElasticSearch [Simple Query String Query](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-simple-query-string-query.html) with flags set to `ALL`. This query allows boolean expression using `+` as AND, `|` as OR and `-` as negation. Refer to the documentation to find all query possibilities. -* `from` is a number that specifies the result offset from which the result documents should be returned. This can be used for result paging. -* `size` is a number that specifies the number of results to return beginning from `from`. -* `doHighlighting` is a boolean value, `true` or `false`. It toggles the creation of snippets that use HTML tags to mark query matches in the document text. -* `doFaceting` is a boolean value, `true` or `false`. It toggles the calculation of the top 10 entity IDs for the query result. - -The following sections describe how to use the Web service in different scenarios. - -### As a development version with Maven - -Use Maven to quickly run the application without the need to build JAR files: +Given that matching documents exist in the search index, the response looks like this: -`./mvnw spring-boot:run` - -### As a Java application - -Compile the application into an executable JAR with the Maven command `mvn clean package`. The application can be run with a command like -``` -java -jar target/smithsearch-1.0.0.jar -``` - -### As a Docker container - -A Docker container with the search application has been published to Docker Hub named `julielab/smithsearch:1.0.0`. Alternatively, this GitHub repository contains a Dockerfile that can be used to create a local Docker image. The next sections show how to use a Docker container as a Web service. A running Docker installation is required. - -All commands specified in this README specify the `--rm` option that will remove the container after it is stopped. Since the application does not have an internal state, it is not necessary to keep the container. The `-p 8080:8080` option maps the container-internal port 8080 to the host port 8080. The second number can be changed to use another host port. - -#### Run the official Docker container from Docker Hub - -On the command line, type -``` -docker run --rm -p 8080:8080 julielab/smithsearch:1.0.0 +```json +{"hits":[{"docId":"1234","text":"Eine 76-jährige Patientin meldet sich in der Sprechstunde an, weil sie seit einiger Zeit an Husten leidet. [...]","highlights":["Eine 76-jährige Patientin meldet sich in der Sprechstunde an, weil sie seit einiger Zeit an Husten leidet","einem Atemwegsinfekt mit Schnupfen, Gliederschmerzen, Abgeschlagenheit, leichtem Fieber und leichtem Husten"]}],"numHits":61,"numHitsRelation":"Eq","entityIdCounts":[{"entityId":"R05","count":61},{"entityId":"R06.0","count":14},{"entityId":"Z01.7","count":7},{"entityId":"R07.0","count":5},{"entityId":"E66.-","count":4},{"entityId":"I50.-","count":4},{"entityId":"R29.1","count":4},{"entityId":"B05.-","count":3},{"entityId":"B26.-","count":3},{"entityId":"G93.6","count":3}]} ``` -This will download the official image, create and run a container. The web application is then available under port 8080 with path `/search`. - -#### With a Docker image built from the repository code - -The `Dockerfile` in the repository allows to create a new, local Docker image from scratch. Run -``` -docker build . -t mypsearchwebapp:1.0.0 -``` -to create a new image named `mypsearchwebapp` with version `1.0.0.`. Create and run a container using -``` -docker run --rm -p 8080:8080 mypsearchwebapp:1.0.0-SNAPSHOT -``` -just as with the official image. +Note how `Husten` is highlighted upon a search for `R05`. Also note the `entityId` counts where `R05` has the highest count, because it was the search query. +The code for the search engine consists of two parts, the indexing pipeline and a Web application. The Web application code is located at the `engine`directory. The indexing pipeline, that is used to read documents, detect entities and create index documents, is found in this Git repository in the `smithsearch-indexing-pipeline` directory. +This application is built with [Spring Boot](https://spring.io/projects/spring-boot) and relies on [ElasticSearch](https://www.elastic.co/) for its search capabilities. The indexing pipeline is built with [UIMA](https://uima.apache.org/) using [JCoRe](https://github.com/JULIELab/jcore-base) components. -### Testing the web service -On *nix-based systems, use cURL to send a test document: -``` -curl -XPOST http://localhost:8080/search -H 'Content-Type: application/json' -d '{"query":"R05","from":0,"size":5,"doHighlighting":true,"doFaceting":true}' -``` -the response will return matched documents to the query term `R05` (the ICD10 code for "Husten") if any are found in the index. Otherwise, the response will indicate that no documents were found. +An ElasticSearch instance can be quickly provided using Docker. See the directory `../es-docker` for instructions. -On Windows, use the PowerShell like this: -``` -PS> $inputText=ConvertTo-Json @(@{query="R05";from=0;size=5;doHighlighting=true;doFaceting:true}) -PS> Invoke-RestMethod -Method POST -ContentType "application/json" -uri http://localhost:8080/search -Body $inputText -``` \ No newline at end of file diff --git a/engine/.mvn/maven-settings.xml b/engine/.mvn/maven-settings.xml new file mode 100644 index 0000000..9c8a6c4 --- /dev/null +++ b/engine/.mvn/maven-settings.xml @@ -0,0 +1,24 @@ + + + + + sonatype-snapshots + + + sonatype-nexus-snapshots + Sonatype Nexus Snapshots + https://oss.sonatype.org/content/repositories/snapshots + + false + + + true + + + + + + + sonatype-snapshots + + \ No newline at end of file diff --git a/engine/Dockerfile b/engine/Dockerfile index 8bc13fc..63a0131 100644 --- a/engine/Dockerfile +++ b/engine/Dockerfile @@ -4,9 +4,8 @@ FROM eclipse-temurin:17-jdk-jammy AS build WORKDIR /app COPY .mvn/ .mvn COPY mvnw pom.xml ./ -#RUN ./mvnw dependency:resolve COPY src ./src -RUN ./mvnw clean package --settings .mvn/maven-settings.xml +RUN ./mvnw clean package -DskipTests=true --settings .mvn/maven-settings.xml FROM eclipse-temurin:17 diff --git a/engine/README.md b/engine/README.md new file mode 100644 index 0000000..a637469 --- /dev/null +++ b/engine/README.md @@ -0,0 +1,98 @@ +## Quickstart + +The quickest way to start up the Web application is to use the official Docker image like this: +``` +docker run --rm -p 8080:8080 -v julielab/smithsearch:1.0.0 +``` + +Then, the web service will be available at `http://localhost:8080/search`. An ElasticSearch instance will be expected at `http://localhost:9200`. + +## Web service Usage + +The Web service offers a REST interface to the `/search` endpoint. Search requests are sent there using the `POST` HTTP method. The request body must be a JSON object in the following format: + +```json +{ + "query": "...", + "from": 0, + "size": 10, + "doHighlighting": true, + "doFaceting": true +} +``` + +Where +* `query` is an ElasticSearch [Simple Query String Query](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-simple-query-string-query.html) with flags set to `ALL`. This query allows boolean expression using `+` as AND, `|` as OR and `-` as negation. Refer to the documentation to find all query possibilities. +* `from` is a number that specifies the result offset from which the result documents should be returned. This can be used for result paging. +* `size` is a number that specifies the number of results to return beginning from `from`. +* `doHighlighting` is a boolean value, `true` or `false`. It toggles the creation of snippets that use HTML tags to mark query matches in the document text. +* `doFaceting` is a boolean value, `true` or `false`. It toggles the calculation of the top 10 entity IDs for the query result. + +The following sections describe how to use the Web service in different scenarios. + +### As a development version with Maven + +Use Maven to quickly run the application without the need to build JAR files: + +`./mvnw spring-boot:run` + +### As a Java application + +Compile the application into an executable JAR with the Maven command `mvn clean package`. The application can be run with a command like +``` +java -jar target/smithsearch-1.0.0.jar +``` + +### As a Docker container + +A Docker container with the search application has been published to Docker Hub named `julielab/smithsearch:1.0.0`. Alternatively, this GitHub repository contains a Dockerfile that can be used to create a local Docker image. The next sections show how to use a Docker container as a Web service. A running Docker installation is required. + +All commands specified in this README specify the `--rm` option that will remove the container after it is stopped. Since the application does not have an internal state, it is not necessary to keep the container. The `-p 8080:8080` option maps the container-internal port 8080 to the host port 8080. The second number can be changed to use another host port. + +#### Run the official Docker container from Docker Hub + +On the command line, type +``` +docker run --rm -p 8080:8080 julielab/smithsearch:1.0.0 +``` + +This will download the official image, create and run a container. The web application is then available under port 8080 with path `/search`. + +#### With a Docker image built from the repository code + +The `Dockerfile` in the repository allows to create a new, local Docker image from scratch. Run +``` +docker build . -t mypsearchwebapp:1.0.0 +``` +to create a new image named `mypsearchwebapp` with version `1.0.0.`. Create and run a container using +``` +docker run --rm -p 8080:8080 mypsearchwebapp:1.0.0 +``` +just as with the official image. + + +### Testing the web service +On *nix-based systems, use cURL to send a test document: +``` +curl -XPOST http://localhost:8080/search -H 'Content-Type: application/json' -d '{"query":"R05","from":0,"size":5,"doHighlighting":true,"doFaceting":true}' +``` +the response will return matched documents to the query term `R05` (the ICD10 code for "Husten") if any are found in the index. Otherwise, the response will indicate that no documents were found. + +On Windows, use the PowerShell like this: +``` +PS> $inputText=ConvertTo-Json @(@{query="R05";from=0;size=5;doHighlighting=true;doFaceting:true}) +PS> Invoke-RestMethod -Method POST -ContentType "application/json" -uri http://localhost:8080/search -Body $inputText +``` + +## Web Application configuration + +The Web application needs to know the URL of the ElasticSearch instance to connect to. The file `src/main/resources/application.properties` lists the properties + +* `spring.elasticsearch.uris` +* `spring.elasticsearch.socket-timeout` +* `spring.elasticsearch.username` +* `spring.elasticsearch.password` + +where username and password are required when ElasticSearch security is enabled. The ElasticSearch Docker setup provided in this repository does not enable security. The minimal required property is `spring.elasticsearch.uris` and must be set to at least one ElasticSearch instance URL. When running the application in a Docker container, the host `host.docker.internal` can be used to connect to an ElasticSearch running on the host, including in another Docker container. Note, however, that for all-Docker setups, a Docker network should be created for more direct communication. + +To use an external configuration file, use `java -jar smithsearch.jar --spring.config.location=classpath:/another-location.properties`. If you use the Docker container, edit the Dockerfile `CMD` line accordingly and mount the configuration file into the container, e.g. by using the `-v` switch. diff --git a/engine/src/main/resources/application.properties b/engine/src/main/resources/application.properties index 68e7083..ff52f81 100644 --- a/engine/src/main/resources/application.properties +++ b/engine/src/main/resources/application.properties @@ -1,4 +1,4 @@ -spring.elasticsearch.uris=http://localhost:9200 +spring.elasticsearch.uris=http://host.docker.internal:9200 spring.elasticsearch.socket-timeout=10s spring.elasticsearch.username=user spring.elasticsearch.password=secret