Skip to content
/ data-stack Public template

A starting point for a data stack using Python, Apache Airflow and Metabase.

License

Notifications You must be signed in to change notification settings

ericdaat/data-stack

Repository files navigation

Data Stack

Python application Documentation Status

1. Presentation

A sample data stack running on Docker, that contains the following components:

  • Airflow
  • Metabase
  • MariaDB, with PHPMyAdmin
  • Postgres, with PHPPgAdmin
  • Doccano data labelling interface
  • Nginx as reverse proxy
  • Sphinx auto-generated documentation
  • A template python module, usable in Airflow DAGS
  • A template machine learning package, using Pytorch
  • A ml_helper package, that provides functions to store machine learning models results and parameters in a database.
  • A utils package with utilities functions.
  • Unit-testing with pytest library

2. Installation

You will need to have the following software installed:

Once you're good, create a virtual environment in install the pre-requisite python libraries:

virtualenv venv;
source venv/bin/activate;
pip install -r requirements.txt;

3. Usage

3.1 Launch the Docker stack

Run it with:

docker-compose up -d

Then visit:

Add your Airflow DAGS in the dags folder.

3.2 Unit testing

Run the unit tests with:

pytest tests

3.3 Generating the Sphinx docs

Generate the Sphinx documentation with:

sphinx-apidoc ./src -o docs/source -M;
cd docs && make html && open build/html/index.html;

4. References