From 5d073d0899f64a62b45393c12f5418250f05e991 Mon Sep 17 00:00:00 2001 From: Trym bremnes Date: Thu, 10 Aug 2023 08:01:29 +0200 Subject: [PATCH] feat: add `Dockerfile` template # Motivation More and more applications are being run natively in containers, for a variety of reasons. This commit is the initial attempt to provide containerized support for `SolrWayback`. # Usage To get going, simply run the following command: ```bash docker build . --tag solrwayback docker run --publish 8080:8080 --publish 8983:8983 --volume \ :/host_dir --tty --interactive solrwayback bash ``` where `` only contains `WARC` files and directories. For more details, please refer to the comments in the `Dockerfile`. # Implementation The `Dockerfile` was created by following the instructions in the top-level readme. In addition to this, a few verification steps were added to ensure that the container works as expected. A simple test that uses `docker` to build the container has been added. # Drawbacks ## No proper indexing test The added test does not verify that indexing works as expected. This should be done by adding some `WARC` files as test data, but this is outside the scope of this commit. ## No automatic update to latest release Whenever someone releases a new version of `SolrWayback`, there is no reminder or automatic failure if the relevant people forget to update the `Dockerfile`. # Future work ## Index preservation The `Dockerfile` does currently not preserve the created index, so it needs to be manually copied out of the container in order to preserve it. With the latest released `SolrWayback` bundle, the index can be found here: `unpacked-bundle/solrwayback_package_4.4.2/solr-7.7.3/server/solr/ configsets/netarchivebuilder/netarchivebuilder_data/index/` ## Custom configuration of `properties` There is currently no way of using your own `solrwayback.properties` and `solrwaybackweb.properties`, which is essential for using the correct branding. --- .github/workflows/test.yml | 14 +++++++++ Dockerfile | 61 ++++++++++++++++++++++++++++++++++++++ README.md | 3 +- 3 files changed, 77 insertions(+), 1 deletion(-) create mode 100644 .github/workflows/test.yml create mode 100644 Dockerfile diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml new file mode 100644 index 00000000..be83cc12 --- /dev/null +++ b/.github/workflows/test.yml @@ -0,0 +1,14 @@ +name: "Test" + +on: + push: + +jobs: + build-docker-image: + name: Build Docker image + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v3 + - name: Build SolrWayback Docker image + run: docker build --tag solrwayback . diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 00000000..2146e0a4 --- /dev/null +++ b/Dockerfile @@ -0,0 +1,61 @@ +# This dockerfile sets up the SolrWayback bundle and attempts to run both Solr and Tomcat. +# To build the image, run: +# docker build . --tag solrwayback + +# To run SolrWayback, you need to launch it with the following parameters +# docker run --publish 8080:8080 --publish 8983:8983 --volume :/host_dir --tty --interactive solrwayback bash +# where is a file path that only contains WARC files and directories. + +# When the container is running, run the following commands to start Solr and Tomcat: +# export SOLRWAYBACK_VERSION=4.4.2 +# export APACHE_TOMCAT_VERSION=8.5.60 +# export SOLR_VERSION=7.7.3 +# ./unpacked-bundle/solrwayback_package_$SOLRWAYBACK_VERSION/solr-$SOLR_VERSION/bin/solr start +# ./unpacked-bundle/solrwayback_package_$SOLRWAYBACK_VERSION/apache-tomcat-$APACHE_TOMCAT_VERSION/bin/startup.sh + +# You should now verify that the following links works with a browser: +# http://localhost:8080/solrwayback/ +# http://localhost:8983/solr/#/ + +# If you have some WARC files you want to index, you can index them with the following commands: +# WARC_FILES=$(find /host_dir/ -type f) +# ./unpacked-bundle/solrwayback_package_$SOLRWAYBACK_VERSION/indexing/warc-indexer.sh $WARC_FILES + +FROM ubuntu:22.04 + +ENV SOLRWAYBACK_VERSION 4.4.2 +ENV APACHE_TOMCAT_VERSION 8.5.60 +ENV SOLR_VERSION 7.7.3 + +RUN apt-get update --assume-yes --quiet +RUN apt-get install wget unzip --assume-yes --quiet + +# Install dependencies +RUN apt-get install default-jre lsof curl --assume-yes --quiet + +RUN useradd --create-home --shell /bin/bash builder +RUN chown builder:builder /home/builder -R + +USER builder +WORKDIR /home/builder + +# Download and unpack SolrWayback bundle +RUN mkdir --parents solrwayback-zip +RUN wget --quiet https://github.com/netarchivesuite/solrwayback/releases/download/${SOLRWAYBACK_VERSION}/solrwayback_package_${SOLRWAYBACK_VERSION}.zip \ + --output-document solrwayback-zip/bundle.zip + +RUN mkdir unpacked-bundle +RUN unzip -q solrwayback-zip/bundle.zip -d unpacked-bundle +RUN rm --recursive solrwayback-zip + +# Set up SolrWayback configuration +RUN cp unpacked-bundle/solrwayback_package_${SOLRWAYBACK_VERSION}/properties/solrwayback.properties . +RUN cp unpacked-bundle/solrwayback_package_${SOLRWAYBACK_VERSION}/properties/solrwaybackweb.properties . + +# Verify that apache-tomcat works +RUN unpacked-bundle/solrwayback_package_${SOLRWAYBACK_VERSION}/apache-tomcat-${APACHE_TOMCAT_VERSION}/bin/startup.sh +RUN unpacked-bundle/solrwayback_package_${SOLRWAYBACK_VERSION}/apache-tomcat-${APACHE_TOMCAT_VERSION}/bin/shutdown.sh + +# Verify that solr works +RUN unpacked-bundle/solrwayback_package_${SOLRWAYBACK_VERSION}/solr-${SOLR_VERSION}/bin/solr start +RUN unpacked-bundle/solrwayback_package_${SOLRWAYBACK_VERSION}/solr-${SOLR_VERSION}/bin/solr stop -all diff --git a/README.md b/README.md index 1dcf7e8e..5f992388 100644 --- a/README.md +++ b/README.md @@ -145,7 +145,8 @@ Documents in SolrWayback are indexed through the [warc-indexer](https://github.c ## Build and test with Docker -Currently disabled as docker image has not been maintained. + +A containerized sample can be found [here](./Dockerfile) ## Contact Thomas Egense (thomas.egense@gmail.com)