Airflow main concepts 🔝
We briefly introduce the Airflow main concepts using the hello_world.py
file that you can find in the repository (inside the dags
folder).
A workflow is a sequence of tasks organised in a way that reflects their relationships and dependencies.
In Airflow a workflow is represented as a DAG (Direct Acyclic Graph).
In the hello_world.py
file the workflow is represented as a Graph with 2 nodes: dummy_task_id
and hello_task_id
.
The Python code that defines the hello_world
dag is the following:
In the above code you can see the elements:
- DAG:
- describe how to run a workflow
- TASKS:
- determine what actually gets done
- are parameterised instances of operators
- have status (e.g.
queued
,running
,failed
)
- OPERATORS:
- are the blueprints for defining what tasks have to get done
Examples of Operators:
PythonOperator
, to run a Python callableSqliteOperator
, to execute a query on a SQLite DBBashOperator
, to execute bash commandsRedshiftToS3Transfer
, to execute an UNLOAD command to Amazon s3 as a CSV with headersGoogleCloudStorageToBigQueryOperator
, to load files from Google Cloud Storage into BigQuery
After this short explanation we can introduce the Exercises: Airflow for training and predicting.