Skip to content

Commit

Permalink
Add documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
khituras committed Dec 29, 2022
1 parent 53967c9 commit 1ed7738
Show file tree
Hide file tree
Showing 5 changed files with 133 additions and 85 deletions.
91 changes: 9 additions & 82 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,94 +2,21 @@

This search engine was developed in the context of the [SMITH](https://www.smith.care/de/) project. It aims to specifically offer search capabilities for medical text in German. Its main feature is the seamless integration of semantic concepts, called named entities, into the search index. This allows to search for canonical IDs of diseases, medications or possibly other entity types. Thus, instead of providing the search engine with synonyms and writing variants of the same concept in order to retrieve as much relevant documents as possible, these steps are handled in the preprocessing step and woven into the index. The engine offers faceting and highlighting capabilities that work with normal text queries as well as entity IDs. Entity IDs and normal words can be used in arbitrary combinations since the entity IDs are just words from the perspective of the search index.

The code for the search engine consists of two parts, the indexing pipeline and this Web application. The indexing pipeline, that is used to read documents, detect entities and create index documents, is found in this Git repository in the `smithsearch-indexing-pipeline` directory.

This application is built with [Spring Boot](https://spring.io/projects/spring-boot) and relies on [ElasticSearch](https://www.elastic.co/) for its search capabilities. The indexing pipeline is built with [UIMA](https://uima.apache.org/) using [JCoRe](https://github.com/JULIELab/jcore-base) components.

An ElasticSearch instance can be quickly provided using Docker. See the directory `../es-docker` for instructions.

## Quickstart

The quickest way to start up the pipeline application is to use the official Docker image like this:
```
docker run --rm -p 8080:8080 -v julielab/smithsearch:1.0.0
```

Then, the web service will be available at `http://localhost:8080/search`.

## Web service Usage

The Web service offers a REST interface to the `/search` endpoint. Search requests are sent there using the `POST` HTTP method. The request body must be a JSON object in the following format:

Consider this example request:
```json
{
"query": "...",
"from": 0,
"size": 10,
"doHighlighting": true,
"doFaceting": true
}
curl -XPOST http://localhost:8080/search -H 'Content-Type: application/json' -d '{"query":"R05","from":0,"size":1,"doHighlighting":true}'
```

Where
* `query` is an ElasticSearch [Simple Query String Query](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-simple-query-string-query.html) with flags set to `ALL`. This query allows boolean expression using `+` as AND, `|` as OR and `-` as negation. Refer to the documentation to find all query possibilities.
* `from` is a number that specifies the result offset from which the result documents should be returned. This can be used for result paging.
* `size` is a number that specifies the number of results to return beginning from `from`.
* `doHighlighting` is a boolean value, `true` or `false`. It toggles the creation of snippets that use HTML tags to mark query matches in the document text.
* `doFaceting` is a boolean value, `true` or `false`. It toggles the calculation of the top 10 entity IDs for the query result.

The following sections describe how to use the Web service in different scenarios.

### As a development version with Maven

Use Maven to quickly run the application without the need to build JAR files:
Given that matching documents exist in the search index, the response looks like this:

`./mvnw spring-boot:run`

### As a Java application

Compile the application into an executable JAR with the Maven command `mvn clean package`. The application can be run with a command like
```
java -jar target/smithsearch-1.0.0.jar
```

### As a Docker container

A Docker container with the search application has been published to Docker Hub named `julielab/smithsearch:1.0.0`. Alternatively, this GitHub repository contains a Dockerfile that can be used to create a local Docker image. The next sections show how to use a Docker container as a Web service. A running Docker installation is required.

All commands specified in this README specify the `--rm` option that will remove the container after it is stopped. Since the application does not have an internal state, it is not necessary to keep the container. The `-p 8080:8080` option maps the container-internal port 8080 to the host port 8080. The second number can be changed to use another host port.

#### Run the official Docker container from Docker Hub

On the command line, type
```
docker run --rm -p 8080:8080 julielab/smithsearch:1.0.0
```json
{"hits":[{"docId":"1234","text":"Eine 76-jährige Patientin meldet sich in der Sprechstunde an, weil sie seit einiger Zeit an Husten leidet. [...]","highlights":["Eine 76-jährige Patientin meldet sich in der Sprechstunde an, weil sie seit einiger Zeit an <em>Husten</em> leidet","einem Atemwegsinfekt mit Schnupfen, Gliederschmerzen, Abgeschlagenheit, leichtem Fieber und leichtem <em>Husten</em>"]}],"numHits":61,"numHitsRelation":"Eq","entityIdCounts":[{"entityId":"R05","count":61},{"entityId":"R06.0","count":14},{"entityId":"Z01.7","count":7},{"entityId":"R07.0","count":5},{"entityId":"E66.-","count":4},{"entityId":"I50.-","count":4},{"entityId":"R29.1","count":4},{"entityId":"B05.-","count":3},{"entityId":"B26.-","count":3},{"entityId":"G93.6","count":3}]}
```

This will download the official image, create and run a container. The web application is then available under port 8080 with path `/search`.

#### With a Docker image built from the repository code

The `Dockerfile` in the repository allows to create a new, local Docker image from scratch. Run
```
docker build . -t mypsearchwebapp:1.0.0
```
to create a new image named `mypsearchwebapp` with version `1.0.0.`. Create and run a container using
```
docker run --rm -p 8080:8080 mypsearchwebapp:1.0.0-SNAPSHOT
```
just as with the official image.
Note how `Husten` is highlighted upon a search for `R05`. Also note the `entityId` counts where `R05` has the highest count, because it was the search query.
The code for the search engine consists of two parts, the indexing pipeline and a Web application. The Web application code is located at the `engine`directory. The indexing pipeline, that is used to read documents, detect entities and create index documents, is found in this Git repository in the `smithsearch-indexing-pipeline` directory.

This application is built with [Spring Boot](https://spring.io/projects/spring-boot) and relies on [ElasticSearch](https://www.elastic.co/) for its search capabilities. The indexing pipeline is built with [UIMA](https://uima.apache.org/) using [JCoRe](https://github.com/JULIELab/jcore-base) components.

### Testing the web service
On *nix-based systems, use cURL to send a test document:
```
curl -XPOST http://localhost:8080/search -H 'Content-Type: application/json' -d '{"query":"R05","from":0,"size":5,"doHighlighting":true,"doFaceting":true}'
```
the response will return matched documents to the query term `R05` (the ICD10 code for "Husten") if any are found in the index. Otherwise, the response will indicate that no documents were found.
An ElasticSearch instance can be quickly provided using Docker. See the directory `../es-docker` for instructions.

On Windows, use the PowerShell like this:
```
PS> $inputText=ConvertTo-Json @(@{query="R05";from=0;size=5;doHighlighting=true;doFaceting:true})
PS> Invoke-RestMethod -Method POST -ContentType "application/json" -uri http://localhost:8080/search -Body $inputText
```
24 changes: 24 additions & 0 deletions engine/.mvn/maven-settings.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<?xml version="1.0" ?>
<settings>
<profiles>
<profile>
<id>sonatype-snapshots</id>
<repositories>
<repository>
<id>sonatype-nexus-snapshots</id>
<name>Sonatype Nexus Snapshots</name>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
</profile>
</profiles>
<activeProfiles>
<activeProfile>sonatype-snapshots</activeProfile>
</activeProfiles>
</settings>
3 changes: 1 addition & 2 deletions engine/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@ FROM eclipse-temurin:17-jdk-jammy AS build
WORKDIR /app
COPY .mvn/ .mvn
COPY mvnw pom.xml ./
#RUN ./mvnw dependency:resolve
COPY src ./src
RUN ./mvnw clean package --settings .mvn/maven-settings.xml
RUN ./mvnw clean package -DskipTests=true --settings .mvn/maven-settings.xml

FROM eclipse-temurin:17

Expand Down
98 changes: 98 additions & 0 deletions engine/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
## Quickstart

The quickest way to start up the Web application is to use the official Docker image like this:
```
docker run --rm -p 8080:8080 -v julielab/smithsearch:1.0.0
```

Then, the web service will be available at `http://localhost:8080/search`. An ElasticSearch instance will be expected at `http://localhost:9200`.

## Web service Usage

The Web service offers a REST interface to the `/search` endpoint. Search requests are sent there using the `POST` HTTP method. The request body must be a JSON object in the following format:

```json
{
"query": "...",
"from": 0,
"size": 10,
"doHighlighting": true,
"doFaceting": true
}
```

Where
* `query` is an ElasticSearch [Simple Query String Query](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-simple-query-string-query.html) with flags set to `ALL`. This query allows boolean expression using `+` as AND, `|` as OR and `-` as negation. Refer to the documentation to find all query possibilities.
* `from` is a number that specifies the result offset from which the result documents should be returned. This can be used for result paging.
* `size` is a number that specifies the number of results to return beginning from `from`.
* `doHighlighting` is a boolean value, `true` or `false`. It toggles the creation of snippets that use HTML tags to mark query matches in the document text.
* `doFaceting` is a boolean value, `true` or `false`. It toggles the calculation of the top 10 entity IDs for the query result.

The following sections describe how to use the Web service in different scenarios.

### As a development version with Maven

Use Maven to quickly run the application without the need to build JAR files:

`./mvnw spring-boot:run`

### As a Java application

Compile the application into an executable JAR with the Maven command `mvn clean package`. The application can be run with a command like
```
java -jar target/smithsearch-1.0.0.jar
```

### As a Docker container

A Docker container with the search application has been published to Docker Hub named `julielab/smithsearch:1.0.0`. Alternatively, this GitHub repository contains a Dockerfile that can be used to create a local Docker image. The next sections show how to use a Docker container as a Web service. A running Docker installation is required.

All commands specified in this README specify the `--rm` option that will remove the container after it is stopped. Since the application does not have an internal state, it is not necessary to keep the container. The `-p 8080:8080` option maps the container-internal port 8080 to the host port 8080. The second number can be changed to use another host port.

#### Run the official Docker container from Docker Hub

On the command line, type
```
docker run --rm -p 8080:8080 julielab/smithsearch:1.0.0
```

This will download the official image, create and run a container. The web application is then available under port 8080 with path `/search`.

#### With a Docker image built from the repository code

The `Dockerfile` in the repository allows to create a new, local Docker image from scratch. Run
```
docker build . -t mypsearchwebapp:1.0.0
```
to create a new image named `mypsearchwebapp` with version `1.0.0.`. Create and run a container using
```
docker run --rm -p 8080:8080 mypsearchwebapp:1.0.0
```
just as with the official image.


### Testing the web service
On *nix-based systems, use cURL to send a test document:
```
curl -XPOST http://localhost:8080/search -H 'Content-Type: application/json' -d '{"query":"R05","from":0,"size":5,"doHighlighting":true,"doFaceting":true}'
```
the response will return matched documents to the query term `R05` (the ICD10 code for "Husten") if any are found in the index. Otherwise, the response will indicate that no documents were found.

On Windows, use the PowerShell like this:
```
PS> $inputText=ConvertTo-Json @(@{query="R05";from=0;size=5;doHighlighting=true;doFaceting:true})
PS> Invoke-RestMethod -Method POST -ContentType "application/json" -uri http://localhost:8080/search -Body $inputText
```

## Web Application configuration

The Web application needs to know the URL of the ElasticSearch instance to connect to. The file `src/main/resources/application.properties` lists the properties

* `spring.elasticsearch.uris`
* `spring.elasticsearch.socket-timeout`
* `spring.elasticsearch.username`
* `spring.elasticsearch.password`

where username and password are required when ElasticSearch security is enabled. The ElasticSearch Docker setup provided in this repository does not enable security. The minimal required property is `spring.elasticsearch.uris` and must be set to at least one ElasticSearch instance URL. When running the application in a Docker container, the host `host.docker.internal` can be used to connect to an ElasticSearch running on the host, including in another Docker container. Note, however, that for all-Docker setups, a Docker network should be created for more direct communication.

To use an external configuration file, use `java -jar smithsearch.jar --spring.config.location=classpath:/another-location.properties`. If you use the Docker container, edit the Dockerfile `CMD` line accordingly and mount the configuration file into the container, e.g. by using the `-v` switch.
2 changes: 1 addition & 1 deletion engine/src/main/resources/application.properties
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
spring.elasticsearch.uris=http://localhost:9200
spring.elasticsearch.uris=http://host.docker.internal:9200
spring.elasticsearch.socket-timeout=10s
spring.elasticsearch.username=user
spring.elasticsearch.password=secret

0 comments on commit 1ed7738

Please sign in to comment.