A sample data stack running on Docker, that contains the following components:
- Airflow
- Metabase
- MariaDB, with PHPMyAdmin
- Postgres, with PHPPgAdmin
- Doccano data labelling interface
- Nginx as reverse proxy
- Sphinx auto-generated documentation
- A template python module, usable in Airflow DAGS
- A template machine learning package, using Pytorch
- A
ml_helper
package, that provides functions to store machine learning models results and parameters in a database. - A
utils
package with utilities functions. - Unit-testing with pytest library
You will need to have the following software installed:
Once you're good, create a virtual environment in install the pre-requisite python libraries:
virtualenv venv;
source venv/bin/activate;
pip install -r requirements.txt;
Run it with:
docker-compose up -d
Then visit:
- localhost:3000: for Metabase
- localhost:8080: for Airflow
- localhost:8000: for Doccano
Add your Airflow DAGS in the dags folder.
Run the unit tests with:
pytest tests
Generate the Sphinx documentation with:
sphinx-apidoc ./src -o docs/source -M;
cd docs && make html && open build/html/index.html;