A experimental poor man's datalake made with Dagster for data orchestration, DuckDB as an high-performance analytical database system, Metabase for data vizualisation and MinIO for persistent data storage.
Everything is shipped as Docker containers with Traefik as a reverse-proxy.
- Having Docker ( v20.10 or latest) installed on your system
- Active ** Python environment** (ideally Python 3.9.8 for consistency) with Poetry installed (for more information, see Dagster Container README)
- Install NPM and Node to run launch scripts; See these instructions
# Build images
npm run d:build
# Start containers with detach daemon
npm run d:up-d
# Alternatively keep them attached
npm run d:up
# When you're done with, shut down...
npm run d:down
# ...remove volumes if needed
npm run d:down:all
Alternatively, you can run Docker scripts directly
# make scripts/docker.sh executable
chmod u+x ./scripts/docker.sh
# list all running composed containers
./scripts/docker.sh ps
# build images
./scripts/docker.sh build
# launch containers
./scripts/docker.sh up -d
# remove containers and network
./scripts/docker.sh down
# ...and so on!
- Dagster UI, Dagit is accessible at http://orchestration.localhost
- Metabase with DuckDB driver is accessible at http://vizualization.localhost
- MinIO dashboard is accessible at http://storage.localhost
Contributions guidelines will be posted soon!
- Cover image from Eaton, Elon Howard. Birds of New York. pt. 1 (1910). Art by Louis Agassiz Fuertes. Contributed in BHL from Eaton, Elon Howard. Birds of New York. pt. 1 (1910). Art by Louis Agassiz Fuertes. Contributed in BHL from Gerstein Science Information Centre (https://s.si.edu/2LlwjIL)
- Special thanks to MileTwo for his Docker multi stage build template for Dagster.
- Every building blocks of this repo are Open Source projects. s/o to every contributors involved !