Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TIKA-3420] Set tesseract ocr langauges as docker build args #2

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
19 changes: 13 additions & 6 deletions docker-tool.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,13 @@ while getopts ":h" opt; do
case ${opt} in
h )
echo "Usage:"
echo " docker-tool.sh -h Display this help message."
echo " docker-tool.sh build <TIKA_VERSION> Builds images for <TIKA_VERSION>."
echo " docker-tool.sh test <TIKA_VERSION> Tests images for <TIKA_VERSION>."
echo " docker-tool.sh publish <TIKA_VERSION> Publishes images for <TIKA_VERSION> to Docker Hub."
echo " docker-tool.sh latest <TIKA_VERSION> Tags images for <TIKA_VERSION> as latest on Docker Hub."
echo " docker-tool.sh -h Display this help message."
echo " docker-tool.sh build <TIKA_VERSION> ['<TESSERACT_LANGUAGES>'] Builds images for <TIKA_VERSION> via special [<TESSERACT_LANGUAGES>]."
echo " docker-tool.sh test <TIKA_VERSION> Tests images for <TIKA_VERSION>."
echo " docker-tool.sh publish <TIKA_VERSION> Publishes images for <TIKA_VERSION> to Docker Hub."
echo " docker-tool.sh latest <TIKA_VERSION> Tags images for <TIKA_VERSION> as latest on Docker Hub."
echo ""
ecgi "Note: [<TESSERACT_LANGUAGES>] is optional for full image, if you want to change default `tesseract-ocr` installation languages."
mhf-ir marked this conversation as resolved.
Show resolved Hide resolved
mhf-ir marked this conversation as resolved.
Show resolved Hide resolved
exit 0
;;
\? )
Expand Down Expand Up @@ -58,13 +60,18 @@ test_docker_image() {
shift $((OPTIND -1))
subcommand=$1; shift
version=$1; shift
tesseract_languages=$1; shift

case "$subcommand" in
build)
build_args="--build-arg TIKA_VERSION=${version}"
if [[ ! -z "$tesseract_languages" ]]; then
build_args="$build_args --build-arg TESSERACT_LANGUAGES='${tesseract_languages}'"
fi
# Build slim version with minimal dependencies
docker build -t apache/tika:${version} --build-arg TIKA_VERSION=${version} - < minimal/Dockerfile --no-cache
# Build full version with OCR, Fonts and GDAL
docker build -t apache/tika:${version}-full --build-arg TIKA_VERSION=${version} - < full/Dockerfile --no-cache
docker build -t apache/tika:${version}-full ${build_args} - < full/Dockerfile --no-cache
mhf-ir marked this conversation as resolved.
Show resolved Hide resolved
;;

test)
Expand Down