Skip to content

Deployment

Jonathan Mang edited this page Mar 17, 2022 · 6 revisions

Deployment of the DQA Tool

Docker

DQAstats (Command line version of the DQA Tool)

You can test the package without needing to install anything except docker. To try out the package follow these instructions:

  1. Make sure you have docker installed

  2. Clone the DQAstats repo:

    git clone https://gitlab.miracum.org/miracum/dqa/dqastats.git dqastats
    cd dqastats
  3. Run the containerized setup using

    docker-compose -f ./docker/docker-compose.yml up
  4. Go to ./docker/output/ and see the created report.

DQAgui (Browser based GUI version of the DQA Tool)

You can test the package without needing to install anything except docker. To try out the package follow these instructions:

  1. Make sure you have docker installed

  2. Clone the DQAgui repo:

    git clone --depth 1 -b development --single-branch https://gitlab.miracum.org/miracum/dqa/dqagui.git dqagui
    cd dqagui/docker
  3. Run the containerized setup using

    docker-compose up -d
  4. Access the GUI under localhost:3839. For a quick intro into the GUI using the Demo Data, see the DQAgui Intro.

  5. To stop, run

    docker-compose down
    cd ../..

Advanced dockerized usage

If you want to use your own docker-compose and .env file(s) you can do this simply by using them in this command:

docker-compose \
  -f docker-compose_miracum.yml \
  --env-file ../dqastats.env \
  up --build

Debugging

Maybe these snippets might be helpful to debug if something goes wrong:

## Open an console inside the container:
docker run -it ghcr.io/miracum/dqastats:latest //bin//bash

## Installed R packages are stored in:
## "/usr/local/lib/R/site-library" and
## "/usr/local/lib/R/library"
## Run example data:
Sys.setenv("EXAMPLECSV_SOURCE_PATH" = system.file("demo_data", package = "DQAstats"));
Sys.setenv("EXAMPLECSV_TARGET_PATH" = system.file("demo_data", package = "DQAstats"));
tmp <- DQAstats::dqa(
                     source_system_name = "exampleCSV_source",
                     target_system_name = "exampleCSV_target",
                     utils_path = "/usr/local/lib/R/site-library/DQAstats/demo_data/utilities",
                     mdr_filename = "mdr_example_data.csv",
                     output_dir = "/data/output",
                     logfile_dir = "/data/logs"
                     )

Kubernetes

Background

The manifest ./docker/dqastats-workflow.yaml uses Argo Workflows to shedule the dockerized version of DQAstats to run a data quality (DQ) analysis on a regular basis.

How to use

  1. Install KinD (Kubernetes in Docker).

  2. Create a local cluster for testing:

    kind create cluster
  3. Install Argo Workflows:

    ## Add the HELM repo for Argo:
    helm repo add bitnami https://charts.bitnami.com/bitnami
    
    ## Install Argo Workflow with own presets:
    helm install argo-wf bitnami/argo-workflows \
        --set server.serviceAccount.name=argo-wf-san
  4. Follow the instructions in the console to obtain the Bearer token, these might be similar to the following:

    ## Note: If you changed the name `arg-wf` of the deployment
    ## in the `helm install ...` command above,
    ## you also need to change it here:
    SECRET=$(kubectl get sa argo-wf-san -o=jsonpath='{.secrets[0].name}')
    ARGO_TOKEN="Bearer $(kubectl get secret $SECRET -o=jsonpath='{.data.token}' | base64 --decode)"
    echo "$ARGO_TOKEN"
  5. Change the manifest ./docker/dqastats-workflow.yaml to your needs or keep the current one for demo purpose.

  6. Send the secret and the workflow to the cluster:

    kubectl apply -f ./docker/dqastats-secret.yaml
    kubectl apply -f ./docker/dqastats-workflow.yaml

Thanks

🎉 Big thanks to @christian.gulden / @chgl for all Kubernetes Support! The draft of this "How to ..." section is borrowed from him, originally from here: https://gitlab.miracum.org/miracum/charts/-/blob/master/README.md.