Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve deployment #143

Merged
merged 40 commits into from
Sep 1, 2024
Merged

Improve deployment #143

merged 40 commits into from
Sep 1, 2024

Conversation

henry410213028
Copy link
Collaborator

@henry410213028 henry410213028 commented Jul 7, 2024

Types of Changes

  • Bugfix
  • Refactoring
  • Documentation Update

Description

The main purpose of this PR is to improve our Airflow setup process, fix some errors, and make it easier for new contributors to use.

Below is the list of changes:

  • Airflow Configuration

    • Change AIRFLOW_HOME from "/usr/local/airflow" to "/opt/airflow" because we have switched to the official Airflow Python 3.8 Docker image.
    • Disable the example DAGs.
  • DAG files

    • Fix AIRFLOW_HOME related file path issues in some DAG tasks.
  • Python dependencies

    • Use pip to manage dependencies, not use poetry.
    • Update requirements.txt and add Airflow constraints file.
    • Remove poetry configuration.
  • Update Dockerfile

    • Install dependencies via pip instead of poetry.
    • Use entrypoint.sh to allow adding initialization tasks before starting Airflow services.
    • Remove unused Dockerfile.test.
  • Use docker-compose to deploy Airflow services

    • Update Makefile and add related aliases to make it easier to use.
    • Use an external Docker volume to store Airflow data to prevent data loss when restarting the container.
  • Update documentation

    • Remove outdated information in README.md.
    • Move some information to CONTRIBUTING.md, MAINTENANCE.md, and DEPLOYMENT.md to make it easier to understand how to contribute, maintain, and deploy.
  • Remove unused Node.js configuration files.

Steps to Test This Pull Request

Follow the steps for setting up the dev/test environment in README.md.

@Lee-W Lee-W self-requested a review July 21, 2024 02:37
requirements.txt Outdated
Comment on lines 14 to 22
# Dev dependencies
bandit
black
flake8
isort
mypy
pytest
pytest-cov
safety
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we separate it as requirement-dev.txt for local development testing?

requirements.txt Outdated
pygsheets
requests
searchconsole
StrEnum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is StrEnum required? Could we use the trick here? https://docs.python.org/3.8/library/enum.html#others

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to remove it.

pyproject.toml Outdated
Comment on lines 23 to 34
# azure-datalake-store, pip._vendor.packaging.requirements.InvalidRequirement
azure-datalake-store = ">=0.0.45"
azure-mgmt-datalake-store = ">=0.5.0"
# https://stackoverflow.com/questions/68687548/apache-airflow-airflow-initdb-throws-modulenotfounderror-no-module-named-wtf
marshmallow = "2.21.0"
wtforms = "2.3.3"
# https://stackoverflow.com/questions/66774109/install-airflow-importerror-no-module-named-clsregistry
sqlalchemy = "1.3.23"
flask-sqlalchemy = "2.4.4"

[tool.poetry.dev-dependencies]
safety = "^1.9.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were removing poetry. am I miss anything?

Copy link
Collaborator Author

@henry410213028 henry410213028 Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, poetry configuration should be removed to avoid any confusion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david30907d I need your help to fix the Python CI install dependencies step, as I don’t have permission to make changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david30907d I need your help to fix the Python CI install dependencies step, as I don’t have permission to make changes.

why is that, you're already admin of this repo 😂

@@ -0,0 +1,349 @@
# for apache-airflow==1.10.13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to confirm where it comes from. Airflow Official

@@ -1,4 +1,4 @@
# get the complete env from other volunteers, please
AIRFLOW_HOME=/opt/airflow
BIGQUERY_PROJECT=pycontw-225217
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I feel this might be something we should make it configurable. 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AIRFLOW_HOME redundant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I mean BIGQUERY_PROJECT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

developer may use their own GCP project for testing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's why I think making it configurable might be a good idea. WDYT?

README.md Outdated
- [Python 3.8+](https://www.python.org/downloads/release/python-3811/)
- [Docker](https://docs.docker.com/get-docker/)
- [Git](https://git-scm.com/book/zh-tw/v2/%E9%96%8B%E5%A7%8B-Git-%E5%AE%89%E8%A3%9D%E6%95%99%E5%AD%B8)
- [Poetry](https://python-poetry.org/docs/#installation) (Optional, only for creating virtual environments during development)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add instruction on how to virtualenv instead? or use tools like pip-tools, Pipenv. AFAIK, poetry does not use pip to resolve dep. I would suggest we either use poetry for all env or not use it at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I use venv module instead.

@@ -27,7 +29,8 @@
)
with dag:
if bool(os.getenv("AIRFLOW_TEST_MODE")):
FILENAMES: Dict[str, Dict] = {"fixtures/data_questionnaire.csv": {}}
filepath = Path(AIRFLOW_HOME) / "dags/fixtures/data_questionnaire.csv"
FILENAMES: Dict[str, Dict] = {str(filepath): {}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: we could use from __future__ import annotations and replace it as the following

Suggested change
FILENAMES: Dict[str, Dict] = {str(filepath): {}}
FILENAMES: dict[str, dict] = {str(filepath): {}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks your suggestion.


### Commit Message

It is recommended to use [Commitizen](https://commitizen-tools.github.io/commitizen/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

* ETL: `/home/zhangtaiwei/pycon-etl`
* Metabase is located here: `/mnt/disks/data-team-additional-disk/pycontw-infra-scripts/data_team/metabase_server`

2. Pull the latest codebase to this server: `sudo git pull`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sudo looks dangerous. Just want to confirm whether it's required

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a issue, project path should be move to another path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

@henry410213028
Copy link
Collaborator Author

upgrade airflow to v1.10.15, due to v1.10.13 has been yanked.

@henry410213028 henry410213028 merged commit 778ad30 into master Sep 1, 2024
2 checks passed
@henry410213028 henry410213028 deleted the improve-deployment branch September 1, 2024 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants