Skip to content

Commit

Permalink
Merge branch 'release/1.3.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
Slawomir Wieczorek committed Sep 20, 2024
2 parents 9dd044e + 782ad29 commit d729831
Show file tree
Hide file tree
Showing 8 changed files with 200 additions and 178 deletions.
4 changes: 2 additions & 2 deletions hello-data-deployment/docker-compose/.env
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
COMPOSE_PROJECT_NAME=hellodata
COMPOSE_PATH_SEPARATOR=:
COMPOSE_FILE=docker-compose.yaml:base/base-services.yaml:default_data_domain/default-data-domain-services.yaml:extra_data_domain/extra-data-domain-services.yaml

#COMPOSE_FILE=docker-compose.yamml:base/base-services.yaml:default_data_domain/default-data-domain-services.yaml
COMPOSE_FILE=docker-compose.yaml:base/base-services.yaml:default_data_domain/default-data-domain-services.yaml:default_data_domain/default-data-domain-jupyterhub.yaml:extra_data_domain/extra-data-domain-services.yaml
REDIS_HOST=redis
REDIS_PORT=6379

Expand Down
38 changes: 28 additions & 10 deletions hello-data-deployment/docker-compose/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

### Operating system

#### At least 10GB of RAM available for docker under following systems
#### At least 15GB of RAM available for docker under following systems (we recommend a decent machine with at least 32GB of physical memory)

- Linux
- MacOS
Expand All @@ -19,8 +19,12 @@
Make sure the `host.docker.internal` is added to the /etc/hosts file with either of these options:

- **Linux/MacOS**: add `127.0.0.1 host.docker.internal` to the `/etc/hosts` file.
- **Windows**: Enable in Docker Desktop under `Settings -> General -> Use WSL 2 based engine` the settings: `Add the *.docker.internal names to the host's etc/hosts file (Requires password)`
- Make sure Docker-Desktop entered it correctly in `C:\Windows\System32\drivers\etc\hosts`. There were some [cases](https://github.com/kanton-bern/hellodata-be/issues/21#issuecomment-1913578206) where its wrong OR the IP is an old one. This will cause an error to communicate between the containers. It should look something like this with your currrent IP placed:
- **Windows**: Enable in Docker Desktop under `Settings -> General -> Use WSL 2 based engine` the settings:
`Add the *.docker.internal names to the host's etc/hosts file (Requires password)`
- Make sure Docker-Desktop entered it correctly in `C:\Windows\System32\drivers\etc\hosts`. There were
some [cases](https://github.com/kanton-bern/hellodata-be/issues/21#issuecomment-1913578206) where its wrong OR the
IP is an old one. This will cause an error to communicate between the containers. It should look something like
this with your currrent IP placed:

```sh
# Added by Docker Desktop
Expand All @@ -31,15 +35,18 @@ Make sure the `host.docker.internal` is added to the /etc/hosts file with either
# End of section
```

Also be sure there is **no Postgres running** on port 5432 as these will conflict with the upspinning Postgres of HelloDATA.
Also be sure there is **no Postgres running** on port 5432 as these will conflict with the upspinning Postgres of
HelloDATA.

If you are on [Mac](#mac), [Windows](#windows) or general [FAQ](#faq), please check the enhanced instructions on the bottom.
If you are on [Mac](#mac), [Windows](#windows) or general [FAQ](#faq), please check the enhanced instructions on the
bottom.

## Quick Start

First **pull and build** all required images.

Please don't forget to run it again after some time in order to fetch the latest changes, or use command below to always **fetch/build** before start (takes longer).
Please don't forget to run it again after some time in order to fetch the latest changes, or use command below to always
**fetch/build** before start (takes longer).

```sh
docker-compose pull
Expand Down Expand Up @@ -69,15 +76,26 @@ After all started, go to [localhost:8080](http://localhost:8080/) in your browse

## FAQ

- **Filebrowser login:** `admin/admin`. After successful login, the user should see the dbt-docs shared storage. Also, files can be opened in local file explorer from `./docker-compose/shared` path.
- **Filebrowser login:** `admin/admin`. After successful login, the user should see the dbt-docs shared storage. Also,
files can be opened in local file explorer from `./docker-compose/shared` path.

## Mac

- **Mac**: And for images unlike intel (e.g. Apple Sillicon), ensure that you are on latest Docker Desktop, and that you enable `Use Rosetta for x86/amd64 emulation on Apple Silicon` under `Settings -> General`. This setting substantially boosts the speed of non-native containers. Find more on [Docker Desktop Settings](https://docs.docker.com/desktop/settings/mac/?uuid=740D92D0-4D7C-4DD7-9DFD-8AF8D62F42F7) and [Multi-platform images](https://docs.docker.com/build/building/multi-platform/).
- **Platform architecture:** If you are on a Mac or another `arm64` architecture, you mostly likely get the message `requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested`. It should still work, but much slower.
- **Mac**: And for images unlike intel (e.g. Apple Sillicon), ensure that you are on latest Docker Desktop, and that you
enable `Use Rosetta for x86/amd64 emulation on Apple Silicon` under `Settings -> General`. This setting substantially
boosts the speed of non-native containers. Find more
on [Docker Desktop Settings](https://docs.docker.com/desktop/settings/mac/?uuid=740D92D0-4D7C-4DD7-9DFD-8AF8D62F42F7)
and [Multi-platform images](https://docs.docker.com/build/building/multi-platform/).

- **Mac**: Also enable this setting under `Settings -> Resources -> Network`: `Use kernel networking for UDP
Use a more efficient kernel networking path for UDP. This may not be compatible with your VPN software.`

- **Platform architecture:** If you are on a Mac or another `arm64` architecture, you mostly likely get the message
`requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested`.
It should still work, but much slower.

## Windows

- If you use Windows native (not WSL or WSL2), ensure the LF (line feeds) are defined in Linux style. Use a tools like [dos2linux](https://linux.die.net/man/1/dos2unix) to
- If you use Windows native (not WSL or WSL2), ensure the LF (line feeds) are defined in Linux style. Use a tools
like [dos2linux](https://linux.die.net/man/1/dos2unix) to
convert, or make sure in your IDE (e.g., IntelliJ has the option to set).
55 changes: 20 additions & 35 deletions hello-data-deployment/docker-compose/base/base-services.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -118,12 +118,12 @@ services:
environment:
POSTGRES_PASSWORD: postgres
POSTGRES_MAX_CONNECTIONS: 200
volumes:
- postgres-volume:/var/lib/postgresql/data
# volumes:
# - postgres-volume:/var/lib/postgresql/data
healthcheck:
test: [ "CMD", "pg_isready", "-U", "postgres" ]
interval: 5s
retries: 5
retries: 20
restart: always

keycloak:
Expand All @@ -137,16 +137,18 @@ services:
KEYCLOAK_IMPORT: /opt/keycloak/data/import/realm.json
DB_VENDOR: h2
KC_HEALTH_ENABLED: "true"
volumes:
- keycloakrealmvolume:/tmp
JAVA_OPTS_APPEND: "-Dcom:redhat:fips=false"
KC_SPI_CONFIG_UPDATE_MODE: "IGNORE"
KC_SPI_CONFIG_UPDATE_ENABLE: "false"
healthcheck:
test: [ "CMD", "curl", "http://localhost:8080/health/ready" ]
test: [ "CMD", "curl", "http://localhost:8080/realms/hellodata" ]
interval: 10s
timeout: 5s
retries: 15
retries: 50
restart: always
extra_hosts:
- "host.docker.internal:host-gateway"
entrypoint: [ "/opt/keycloak/bin/kc.sh", "start-dev", "--import-realm", "--hostname", "host.docker.internal", "--hostname", "localhost", "--http-enabled", "true" ]

redis:
platform: ${HD_PLATFORM}
Expand All @@ -158,10 +160,8 @@ services:
test: [ "CMD", "redis-cli", "ping" ]
interval: 5s
timeout: 30s
retries: 50
retries: 20
restart: always
volumes:
- redis_data:/data

nats:
platform: ${HD_PLATFORM}
Expand Down Expand Up @@ -335,7 +335,7 @@ services:
test: [ "CMD", "curl", "--fail", "http://${AIRFLOW_WEBSERVER_HOST_PORT:-8080}/health" ]
interval: 10s
timeout: 10s
retries: 5
retries: 20
restart: always
depends_on:
<<: *airflow-common-depends-on
Expand All @@ -350,7 +350,7 @@ services:
test: [ "CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"' ]
interval: 10s
timeout: 10s
retries: 5
retries: 20
restart: always
depends_on:
<<: *airflow-common-depends-on
Expand All @@ -367,7 +367,7 @@ services:
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
retries: 20
environment:
<<: *airflow-common-env
# Required to handle warm shutdown of the celery workers properly
Expand All @@ -387,7 +387,7 @@ services:
test: [ "CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"' ]
interval: 10s
timeout: 10s
retries: 5
retries: 20
restart: always
depends_on:
<<: *airflow-common-depends-on
Expand Down Expand Up @@ -494,7 +494,7 @@ services:
test: [ "CMD", "curl", "--fail", "http://localhost:5555/" ]
interval: 10s
timeout: 10s
retries: 5
retries: 20
restart: always
depends_on:
<<: *airflow-common-depends-on
Expand Down Expand Up @@ -547,13 +547,6 @@ services:
restart: on-failure
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on:
keycloak:
condition: service_healthy
postgres:
condition: service_healthy
hello-data-portal-api:
condition: service_healthy
ports:
- 8080:80

Expand Down Expand Up @@ -583,24 +576,13 @@ services:
- keycloak
- redis
- nats
- smtp4dev
- superset-app-default-data-domain
- superset-app-extra-data-domain
- airflow-webserver
- cloudbeaver
- filebrowser
- hello-data-airflow-sidecar
- hello-data-sidecar-cloudbeaver
- hello-data-dbt-docs-sidecar
- hello-data-superset-sidecar-default-data-domain
- hello-data-superset-sidecar-extra-data-domain
ports:
- 8081:8081
healthcheck:
test: [ "CMD", "curl", "http://hello-data-portal-api:8081/api/actuator/health" ]
interval: 10s
timeout: 5s
retries: 15
retries: 20

hello-data-portal-sidecar:
platform: ${HD_PLATFORM}
Expand All @@ -617,9 +599,12 @@ services:
- postgres
- keycloak
- hello-data-portal-api
- airflow-webserver
- cloudbeaver
depends_on:
- airflow-webserver
- cloudbeaver
- postgres
- keycloak
- redis
- nats
- hello-data-portal-api
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#
# Copyright © 2024, Kanton Bern
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of the <organization> nor the
# names of its contributors may be used to endorse or promote products
# derived from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#

version: "3.7"

services:
jupyterhub-default-data-domain:
build:
context: ./base/jupyterhub
dockerfile: Dockerfile.jupyterhub
args:
JUPYTERHUB_VERSION: latest
restart: always
image: hellodata-jupyterhub-notebook-default-data-domain
container_name: jupyterhub-default-data-domain
networks:
- hello-data-network
volumes:
# The JupyterHub configuration file
- "./base/jupyterhub/jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py:ro"
# Bind Docker socket on the host, so we can connect to the daemon from
# within the container
- "/var/run/docker.sock:/var/run/docker.sock:rw"
# Bind Docker volume on host for JupyterHub database and cookie secrets
#- "jupyterhub-data:/data"
- "shared_scripts:/srv/jupyterhub/shared_scripts:ro"
ports:
- "8000:8000"
environment:
# This username will be a JupyterHub admin
JUPYTERHUB_ADMIN: admin
# All containers will join this network
DOCKER_NETWORK_NAME: hello-data-network
# JupyterHub will spawn this Notebook image for users
DOCKER_NOTEBOOK_IMAGE: hello-data-base-notebook:latest
# Notebook directory inside user image
DOCKER_NOTEBOOK_DIR: /home/jovyan/work
JUPYTERHUB_LOG_LEVEL: DEBUG
OAUTH2_CLIENT_ID: frontend-client
OAUTH2_CLIENT_SECRET: not-required
OAUTH2_ISSUER_URL: http://host.docker.internal:38080/realms/hellodata
OAUTH2_AUTHORIZE_URL: http://host.docker.internal:38080/realms/hellodata/protocol/openid-connect/auth
OAUTH2_TOKEN_URL: http://host.docker.internal:38080/realms/hellodata/protocol/openid-connect/token
OAUTH2_USERDATA_URL: http://host.docker.internal:38080/realms/hellodata/protocol/openid-connect/userinfo
OAUTH2_CALLBACK_URL: http://localhost:8000/hub/oauth_callback
OAUTH2_SCOPE: openid
OAUTH2_LOGIN_SERVICE: keycloak
OAUTH2_USERNAME_KEY: preferred_username
OAUTH2_USERDATA_PARAMS: state:state
OAUTH2_ALLOW_ALL: true
HUB_IP: jupyterhub-default-data-domain
JUPYTERHUB_CRYPT_KEY: GTxQ9tNJ3v5TA3ZrmFs4ZtW0yF3a2nLn+f6Pd8c+Z5E=

jupyterhub-proxy-default-data-domain:
image: jupyterhub/configurable-http-proxy:latest
container_name: jupyterhub-proxy-default-data-domain
environment:
- CONFIGPROXY_AUTH_TOKEN=s3cr3t
- HUB_IP=jupyterhub-default-data-domain
ports:
- "8001:8001"
depends_on:
- jupyterhub-default-data-domain
networks:
- hello-data-network

# just to build an image
jupyterhub-notebook-default-data-domain:
build:
context: ./base/jupyterhub
dockerfile: Dockerfile.base_notebook
environment:
- JUPYTER_ENABLE_LAB=yes
networks:
- hello-data-network
depends_on:
- jupyterhub-default-data-domain
command: echo "This service is only for building our custom image which will be spawned by jupyterhub"

hello-data-jupyterhub-gateway-default-data-domain:
image: bedag/hello-data-jupyterhub-gateway
env_file: .env
environment:
- JUPYTERHUB_SERVICE_NAME=host.docker.internal:8000
- SPRING_R2DBC_URL=r2dbc:postgresql://host.docker.internal:35432/hd_metainfo
- SPRING_R2DBC_USERNAME=postgres
- SPRING_R2DBC_PASSWORD=postgres
- HELLO_DATA_CONTEXTS_0=Data Domain | Default_Data_Domain | Default Data Domain
ports:
- "8088:8088"
- "8083:8082"
depends_on:
- postgres
- keycloak
- jupyterhub-default-data-domain
restart: always
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
- hello-data-network

hello-data-sidecar-jupyterhub-default-data-domain:
image: bedag/hello-data-sidecar-jupyterhub
env_file: .env
environment:
- HELLODATA_JUPYTERHUB_DWH_URL=jdbc:postgresql://host.docker.internal
- SPRING_DATASOURCE_URL=jdbc:postgresql://host.docker.internal:35432/hd_metainfo
- NATS_SPRING_SERVER=nats://nats:4222
- HELLO_DATA_INSTANCE_URL=http://localhost:8088/
- HELLO_DATA_INSTANCE_NAME=Jupyterhub Default Data Domain
- HELLO_DATA_CONTEXTS_0=Data Domain | Default_Data_Domain | Default Data Domain
- HELLO_DATA_SIDECAR_PUBLISH_INTERVAL_SECONDS=30
depends_on:
- postgres
- keycloak
- jupyterhub-default-data-domain
restart: always
ports:
- "8091:8089"
extra_hosts:
- "host.docker.internal:host-gateway"
Loading

0 comments on commit d729831

Please sign in to comment.