-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve deployment #143
Improve deployment #143
Conversation
requirements.txt
Outdated
# Dev dependencies | ||
bandit | ||
black | ||
flake8 | ||
isort | ||
mypy | ||
pytest | ||
pytest-cov | ||
safety |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we separate it as requirement-dev.txt
for local development testing?
requirements.txt
Outdated
pygsheets | ||
requests | ||
searchconsole | ||
StrEnum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is StrEnum
required? Could we use the trick here? https://docs.python.org/3.8/library/enum.html#others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to remove it.
pyproject.toml
Outdated
# azure-datalake-store, pip._vendor.packaging.requirements.InvalidRequirement | ||
azure-datalake-store = ">=0.0.45" | ||
azure-mgmt-datalake-store = ">=0.5.0" | ||
# https://stackoverflow.com/questions/68687548/apache-airflow-airflow-initdb-throws-modulenotfounderror-no-module-named-wtf | ||
marshmallow = "2.21.0" | ||
wtforms = "2.3.3" | ||
# https://stackoverflow.com/questions/66774109/install-airflow-importerror-no-module-named-clsregistry | ||
sqlalchemy = "1.3.23" | ||
flask-sqlalchemy = "2.4.4" | ||
|
||
[tool.poetry.dev-dependencies] | ||
safety = "^1.9.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were removing poetry
. am I miss anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, poetry configuration should be removed to avoid any confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@david30907d I need your help to fix the Python CI install dependencies step, as I don’t have permission to make changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@david30907d I need your help to fix the Python CI install dependencies step, as I don’t have permission to make changes.
why is that, you're already admin of this repo 😂
constraints-3.8.txt
Outdated
@@ -0,0 +1,349 @@ | |||
# for apache-airflow==1.10.13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to confirm where it comes from. Airflow Official
@@ -1,4 +1,4 @@ | |||
# get the complete env from other volunteers, please | |||
AIRFLOW_HOME=/opt/airflow | |||
BIGQUERY_PROJECT=pycontw-225217 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I feel this might be something we should make it configurable. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AIRFLOW_HOME redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I mean BIGQUERY_PROJECT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
developer may use their own GCP project for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's why I think making it configurable might be a good idea. WDYT?
README.md
Outdated
- [Python 3.8+](https://www.python.org/downloads/release/python-3811/) | ||
- [Docker](https://docs.docker.com/get-docker/) | ||
- [Git](https://git-scm.com/book/zh-tw/v2/%E9%96%8B%E5%A7%8B-Git-%E5%AE%89%E8%A3%9D%E6%95%99%E5%AD%B8) | ||
- [Poetry](https://python-poetry.org/docs/#installation) (Optional, only for creating virtual environments during development) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add instruction on how to virtualenv
instead? or use tools like pip-tools
, Pipenv
. AFAIK, poetry
does not use pip
to resolve dep. I would suggest we either use poetry
for all env or not use it at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I use venv module instead.
@@ -27,7 +29,8 @@ | |||
) | |||
with dag: | |||
if bool(os.getenv("AIRFLOW_TEST_MODE")): | |||
FILENAMES: Dict[str, Dict] = {"fixtures/data_questionnaire.csv": {}} | |||
filepath = Path(AIRFLOW_HOME) / "dags/fixtures/data_questionnaire.csv" | |||
FILENAMES: Dict[str, Dict] = {str(filepath): {}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: we could use from __future__ import annotations
and replace it as the following
FILENAMES: Dict[str, Dict] = {str(filepath): {}} | |
FILENAMES: dict[str, dict] = {str(filepath): {}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks your suggestion.
|
||
### Commit Message | ||
|
||
It is recommended to use [Commitizen](https://commitizen-tools.github.io/commitizen/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
docs/DEPLOYMENT.md
Outdated
* ETL: `/home/zhangtaiwei/pycon-etl` | ||
* Metabase is located here: `/mnt/disks/data-team-additional-disk/pycontw-infra-scripts/data_team/metabase_server` | ||
|
||
2. Pull the latest codebase to this server: `sudo git pull` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sudo
looks dangerous. Just want to confirm whether it's required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a issue, project path should be move to another path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
upgrade airflow to v1.10.15, due to v1.10.13 has been yanked. |
6e17cfc
to
011209e
Compare
4932299
to
b047fa1
Compare
Types of Changes
Description
The main purpose of this PR is to improve our Airflow setup process, fix some errors, and make it easier for new contributors to use.
Below is the list of changes:
Airflow Configuration
AIRFLOW_HOME
from "/usr/local/airflow" to "/opt/airflow" because we have switched to the official Airflow Python 3.8 Docker image.DAG files
AIRFLOW_HOME
related file path issues in some DAG tasks.Python dependencies
pip
to manage dependencies, not usepoetry
.requirements.txt
and add Airflow constraints file.poetry
configuration.Update Dockerfile
pip
instead ofpoetry
.entrypoint.sh
to allow adding initialization tasks before starting Airflow services.Dockerfile.test
.Use docker-compose to deploy Airflow services
Makefile
and add related aliases to make it easier to use.Update documentation
README.md
.CONTRIBUTING.md
,MAINTENANCE.md
, andDEPLOYMENT.md
to make it easier to understand how to contribute, maintain, and deploy.Remove unused Node.js configuration files.
Steps to Test This Pull Request
Follow the steps for setting up the dev/test environment in
README.md
.