Skip to content

Commit

Permalink
Turn optional-dependencies in pyproject.toml into dynamic property
Browse files Browse the repository at this point in the history
While currently hatchling and pip nicely supports dynamic replacement of
the dependencies even if they are statically defined, this is not proper
according to EP 621. When property of the project is set to be dynamic,
it also contains static values. It's either static or dynamic.

This is not a problem for wheel packages when installed, by any
standard tool, because the wheel package has all the metadata added
to wheel (and does not contain pyproject.toml) but in various cases
(such as installing airflow via Github URL or from sdist, it can
make a difference - depending whether the tool installing airflow will
use directly pyproject.toml for optimization, or whether it will run
build hooks to prepare the dependencies).

This change makes all optional dependencies dynamici - rather than
bake them in the pyproject.toml, we mark them as dynamic, so that
any tool that uses pyproject.toml or sdist PKG-INFO will know that
it has to run build hooks to get the actual optional dependencies.

There are a few consequences of that:

* our pyproject.toml will not contain automatically generated
  part - which is actually good, as it caused some confusion

* all dynamic optional dependencies of ours are either present in
  hatch_build.py or calculated there - this is a bit new
  but sounds reasonable - and those dynamic dependencies are not
  really updated often, so thish is not an issue to maintain them
  there

* the pre-commits that manage the optional dependencies got a lot
  simpler now - a lot of code has been removed.
  • Loading branch information
potiuk committed Mar 24, 2024
1 parent 25e5d54 commit 8313523
Show file tree
Hide file tree
Showing 35 changed files with 1,331 additions and 1,840 deletions.
1 change: 0 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@
!Dockerfile
!hatch_build.py
!prod_image_installed_providers.txt
!airflow_pre_installed_providers.txt

# This folder is for you if you want to add any packages to the docker context when you build your own
# docker image. most of other files and any new folder you add will be excluded by default
Expand Down
18 changes: 9 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -432,26 +432,26 @@ repos:
additional_dependencies: ['setuptools', 'rich>=12.4.4', 'pyyaml', 'tomli']
- id: check-extra-packages-references
name: Checks setup extra packages
description: Checks if all the extras defined in pyproject.toml are listed in extra-packages-ref.rst file
description: Checks if all the extras defined in hatch_build.py are listed in extra-packages-ref.rst file
language: python
files: ^docs/apache-airflow/extra-packages-ref\.rst$|^pyproject.toml
files: ^docs/apache-airflow/extra-packages-ref\.rst$|^hatch_build.py
pass_filenames: false
entry: ./scripts/ci/pre_commit/pre_commit_check_extra_packages_ref.py
additional_dependencies: ['rich>=12.4.4', 'tomli', 'tabulate']
- id: check-pyproject-toml-order
name: Check order of dependencies in pyproject.toml
additional_dependencies: ['rich>=12.4.4', 'hatchling==1.22.4', 'tabulate']
- id: check-hatch-build-order
name: Check order of dependencies in hatch_build.py
language: python
files: ^pyproject\.toml$
files: ^hatch_build.py$
pass_filenames: false
entry: ./scripts/ci/pre_commit/pre_commit_check_order_pyproject_toml.py
additional_dependencies: ['rich>=12.4.4']
entry: ./scripts/ci/pre_commit/pre_commit_check_order_hatch_build.py
additional_dependencies: ['rich>=12.4.4', 'hatchling==1.22.4']
- id: update-extras
name: Update extras in documentation
entry: ./scripts/ci/pre_commit/pre_commit_insert_extras.py
language: python
files: ^contributing-docs/12_airflow_dependencies_and_extras.rst$|^INSTALL$|^airflow/providers/.*/provider\.yaml$|^Dockerfile.*
pass_filenames: false
additional_dependencies: ['rich>=12.4.4', 'tomli']
additional_dependencies: ['rich>=12.4.4', 'hatchling==1.22.4']
- id: check-extras-order
name: Check order of extras in Dockerfile
entry: ./scripts/ci/pre_commit/pre_commit_check_order_dockerfile_extras.py
Expand Down
11 changes: 8 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -455,13 +455,17 @@ function install_airflow_dependencies_from_branch_tip() {
if [[ ${INSTALL_POSTGRES_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/postgres,}
fi
local TEMP_AIRFLOW_DIR
TEMP_AIRFLOW_DIR=$(mktemp -d)
# Install latest set of dependencies - without constraints. This is to download a "base" set of
# dependencies that we can cache and reuse when installing airflow using constraints and latest
# pyproject.toml in the next step (when we install regular airflow).
set -x
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
"apache-airflow[${AIRFLOW_EXTRAS}] @ https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz"
curl -fsSL "https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz" | \
tar xvz -C "${TEMP_AIRFLOW_DIR}" --strip 1
# Make sure editable dependencies are calculated when devel-ci dependencies are installed
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${ADDITIONAL_PIP_INSTALL_FLAGS} \
--editable "${TEMP_AIRFLOW_DIR}[${AIRFLOW_EXTRAS}]"
set +x
common::install_packaging_tools
set -x
Expand All @@ -477,6 +481,7 @@ function install_airflow_dependencies_from_branch_tip() {
set +x
${PACKAGING_TOOL_CMD} uninstall ${EXTRA_UNINSTALL_FLAGS} apache-airflow
set -x
rm -rvf "${TEMP_AIRFLOW_DIR}"
# If you want to make sure dependency is removed from cache in your PR when you removed it from
# pyproject.toml - please add your dependency here as a list of strings
# for example:
Expand Down
12 changes: 8 additions & 4 deletions Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -402,13 +402,17 @@ function install_airflow_dependencies_from_branch_tip() {
if [[ ${INSTALL_POSTGRES_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/postgres,}
fi
local TEMP_AIRFLOW_DIR
TEMP_AIRFLOW_DIR=$(mktemp -d)
# Install latest set of dependencies - without constraints. This is to download a "base" set of
# dependencies that we can cache and reuse when installing airflow using constraints and latest
# pyproject.toml in the next step (when we install regular airflow).
set -x
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
"apache-airflow[${AIRFLOW_EXTRAS}] @ https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz"
curl -fsSL "https://github.com/${AIRFLOW_REPO}/archive/${AIRFLOW_BRANCH}.tar.gz" | \
tar xvz -C "${TEMP_AIRFLOW_DIR}" --strip 1
# Make sure editable dependencies are calculated when devel-ci dependencies are installed
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${ADDITIONAL_PIP_INSTALL_FLAGS} \
--editable "${TEMP_AIRFLOW_DIR}[${AIRFLOW_EXTRAS}]"
set +x
common::install_packaging_tools
set -x
Expand All @@ -424,6 +428,7 @@ function install_airflow_dependencies_from_branch_tip() {
set +x
${PACKAGING_TOOL_CMD} uninstall ${EXTRA_UNINSTALL_FLAGS} apache-airflow
set -x
rm -rvf "${TEMP_AIRFLOW_DIR}"
# If you want to make sure dependency is removed from cache in your PR when you removed it from
# pyproject.toml - please add your dependency here as a list of strings
# for example:
Expand Down Expand Up @@ -1309,7 +1314,6 @@ COPY airflow/__init__.py ${AIRFLOW_SOURCES}/airflow/
COPY generated/* ${AIRFLOW_SOURCES}/generated/
COPY constraints/* ${AIRFLOW_SOURCES}/constraints/
COPY LICENSE ${AIRFLOW_SOURCES}/LICENSE
COPY airflow_pre_installed_providers.txt ${AIRFLOW_SOURCES}/
COPY hatch_build.py ${AIRFLOW_SOURCES}/
COPY --from=scripts install_airflow.sh /scripts/docker/

Expand Down
140 changes: 101 additions & 39 deletions INSTALL
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# INSTALL / BUILD instructions for Apache Airflow
INSTALL / BUILD instructions for Apache Airflow

## Basic installation of Airflow from sources and development environment setup
Basic installation of Airflow from sources and development environment setup
============================================================================

This is a generic installation method that requires minimum starndard tools to develop airflow and
test it in local virtual environment (using standard CPyhon installation and `pip`).
Expand All @@ -23,7 +24,18 @@ MacOS (Mojave/Catalina) you might need to to install XCode command line tools an

brew install sqlite mysql postgresql

## Downloading and installing Airflow from sources
The `pip` is one of the build packaging front-ends that might be used to install Airflow. It's the one
that we recommend (see below) for reproducible installation of specific versions of Airflow.

As of version 2.8 Airflow follows PEP 517/518 and uses `pyproject.toml` file to define build dependencies
and build process and it requires relatively modern versions of packaging tools to get airflow built from
local sources or sdist packages, as PEP 517 compliant build hooks are used to determine dynamic build
dependencies. In case of `pip` it means that at least version 22.1.0 is needed (released at the beginning of
2022) to build or install Airflow from sources. This does not affect the ability of installing Airflow from
released wheel packages.

Downloading and installing Airflow from sources
-----------------------------------------------

While you can get Airflow sources in various ways (including cloning https://github.com/apache/airflow/), the
canonical way to download it is to fetch the tarball published at https://downloads.apache.org where you can
Expand Down Expand Up @@ -95,7 +107,8 @@ Airflow project contains some pre-defined virtualenv definitions in ``pyproject.
easily used by hatch to create your local venvs. This is not necessary for you to develop and test
Airflow, but it is a convenient way to manage your local Python versions and virtualenvs.

## Installing Hatch
Installing Hatch
----------------

You can install hat using various other ways (including Gui installers).

Expand Down Expand Up @@ -128,19 +141,21 @@ You can see the list of available envs with:

This is what it shows currently:

┏━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ Type ┃ Features ┃ Description ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ default │ virtual │ devel │ Default environment with Python 3.8 for maximum compatibility │
├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
│ airflow-38 │ virtual │ │ Environment with Python 3.8. No devel installed. │
├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
│ airflow-39 │ virtual │ │ Environment with Python 3.9. No devel installed. │
├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
│ airflow-310 │ virtual │ │ Environment with Python 3.10. No devel installed. │
├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
│ airflow-311 │ virtual │ │ Environment with Python 3.11. No devel installed │
└─────────────┴─────────┴──────────┴───────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ Type ┃ Description ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ default │ virtual │ Default environment with Python 3.8 for maximum compatibility │
├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤
│ airflow-38 │ virtual │ Environment with Python 3.8. No devel installed. │
├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤
│ airflow-39 │ virtual │ Environment with Python 3.9. No devel installed. │
├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤
│ airflow-310 │ virtual │ Environment with Python 3.10. No devel installed. │
├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤
│ airflow-311 │ virtual │ Environment with Python 3.11. No devel installed │
├─────────────┼─────────┼───────────────────────────────────────────────────────────────┤
│ airflow-312 │ virtual │ Environment with Python 3.11. No devel installed │
└─────────────┴─────────┴───────────────────────────────────────────────────────────────┘

The default env (if you have not used one explicitly) is `default` and it is a Python 3.8
virtualenv for maximum compatibility with `devel` extra installed - this devel extra contains the minimum set
Expand Down Expand Up @@ -229,7 +244,8 @@ and install to latest supported ones by pure airflow core.
pip install -e ".[devel]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt"

## All airflow extras
Airflow extras
==============

Airflow has a number of extras that you can install to get additional dependencies. They sometimes install
providers, sometimes enable other features where packages are not installed by default.
Expand All @@ -239,36 +255,69 @@ https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html

The list of available extras is below.

Regular extras that are available for users in the Airflow package.
Core extras
-----------

Those extras are available as regular core airflow extras - they install optional features of Airflow.

# START CORE EXTRAS HERE

aiobotocore, apache-atlas, apache-webhdfs, async, cgroups, deprecated-api, github-enterprise,
google-auth, graphviz, kerberos, ldap, leveldb, otel, pandas, password, pydantic, rabbitmq, s3fs,
saml, sentry, statsd, uv, virtualenv

# END CORE EXTRAS HERE

# START REGULAR EXTRAS HERE
Provider extras
---------------

aiobotocore, airbyte, alibaba, all, all-core, all-dbs, amazon, apache-atlas, apache-beam, apache-
cassandra, apache-drill, apache-druid, apache-flink, apache-hdfs, apache-hive, apache-impala,
apache-kafka, apache-kylin, apache-livy, apache-pig, apache-pinot, apache-spark, apache-webhdfs,
apprise, arangodb, asana, async, atlas, atlassian-jira, aws, azure, cassandra, celery, cgroups,
cloudant, cncf-kubernetes, cohere, common-io, common-sql, crypto, databricks, datadog, dbt-cloud,
deprecated-api, dingding, discord, docker, druid, elasticsearch, exasol, fab, facebook, ftp, gcp,
gcp_api, github, github-enterprise, google, google-auth, graphviz, grpc, hashicorp, hdfs, hive,
http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes, ldap, leveldb, microsoft-azure,
microsoft-mssql, microsoft-psrp, microsoft-winrm, mongo, mssql, mysql, neo4j, odbc, openai,
openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, pandas, papermill, password,
pgvector, pinecone, pinot, postgres, presto, pydantic, qdrant, rabbitmq, redis, s3, s3fs,
salesforce, samba, saml, segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake,
spark, sqlite, ssh, statsd, tableau, tabular, telegram, teradata, trino, uv, vertica, virtualenv,
weaviate, webhdfs, winrm, yandex, zendesk
Those extras are available as regular Airflow extras, they install provider packages in standard builds
or dependencies that are necessary to enable the feature in editable build.

# END REGULAR EXTRAS HERE
# START PROVIDER EXTRAS HERE

Devel extras - used to install development-related tools. Only available during editable install.
airbyte, alibaba, amazon, apache.beam, apache.cassandra, apache.drill, apache.druid, apache.flink,
apache.hdfs, apache.hive, apache.impala, apache.kafka, apache.kylin, apache.livy, apache.pig,
apache.pinot, apache.spark, apprise, arangodb, asana, atlassian.jira, celery, cloudant,
cncf.kubernetes, cohere, common.io, common.sql, databricks, datadog, dbt.cloud, dingding, discord,
docker, elasticsearch, exasol, fab, facebook, ftp, github, google, grpc, hashicorp, http, imap,
influxdb, jdbc, jenkins, microsoft.azure, microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo,
mysql, neo4j, odbc, openai, openfaas, openlineage, opensearch, opsgenie, oracle, pagerduty,
papermill, pgvector, pinecone, postgres, presto, qdrant, redis, salesforce, samba, segment,
sendgrid, sftp, singularity, slack, smtp, snowflake, sqlite, ssh, tableau, tabular, telegram,
teradata, trino, vertica, weaviate, yandex, zendesk

# END PROVIDER EXTRAS HERE

Devel extras
------------

The `devel` extras are not available in the released packages. They are only available when you install
Airflow from sources in `editable` installation - i.e. one that you are usually using to contribute to
Airflow. They provide tools such as `pytest` and `mypy` for general purpose development and testing.

# START DEVEL EXTRAS HERE

devel, devel-all, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel-
hadoop, devel-mypy, devel-sentry, devel-static-checks, devel-tests
devel, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel-hadoop,
devel-mypy, devel-sentry, devel-static-checks, devel-tests

# END DEVEL EXTRAS HERE

Bundle extras
-------------

Those extras are bundles dynamically generated from other extras.

# START BUNDLE EXTRAS HERE

all, all-core, all-dbs, devel-all, devel-ci

# END BUNDLE EXTRAS HERE


Doc extras
----------

Doc extras - used to install dependencies that are needed to build documentation. Only available during
editable install.

Expand All @@ -278,7 +327,20 @@ doc, doc-gen

# END DOC EXTRAS HERE

## Compiling front end assets
Deprecated extras
-----------------

The `deprecated` extras are deprecated extras from Airflow 1 that will be removed in future versions.

# START DEPRECATED EXTRAS HERE

atlas, aws, azure, cassandra, crypto, druid, gcp, gcp-api, hdfs, hive, kubernetes, mssql, pinot, s3,
spark, webhdfs, winrm

# END DEPRECATED EXTRAS HERE

Compiling front end assets
--------------------------

Sometimes you can see that front-end assets are missing and website looks broken. This is because
you need to compile front-end assets. This is done automatically when you create a virtualenv
Expand Down
2 changes: 1 addition & 1 deletion airflow_pre_installed_providers.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# List of all the providers that are pre-installed when you run `pip install apache-airflow` without extras
common.io
common.sql
fab>=1.0.2dev0
fab>=1.0.2dev1
ftp
http
imap
Expand Down
2 changes: 1 addition & 1 deletion clients/python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
# under the License.

[build-system]
requires = ["hatchling"]
requires = ["hatchling==1.22.4"]
build-backend = "hatchling.build"

[project]
Expand Down
Loading

0 comments on commit 8313523

Please sign in to comment.