Skip to content

Latest commit

 

History

History
41 lines (29 loc) · 1.78 KB

airflow_main_concepts.md

File metadata and controls

41 lines (29 loc) · 1.78 KB

Airflow main concepts 🔝

We briefly introduce the Airflow main concepts using the hello_world.py file that you can find in the repository (inside the dags folder).

Example: Hello World DAG

A workflow is a sequence of tasks organised in a way that reflects their relationships and dependencies.

In Airflow a workflow is represented as a DAG (Direct Acyclic Graph).

In the hello_world.py file the workflow is represented as a Graph with 2 nodes: dummy_task_id and hello_task_id.

hello world graph

The Python code that defines the hello_world dag is the following:

hello world dag

In the above code you can see the elements:

  • DAG:
    • describe how to run a workflow
  • TASKS:
    • determine what actually gets done
    • are parameterised instances of operators
    • have status (e.g. queued, running, failed)
  • OPERATORS:
    • are the blueprints for defining what tasks have to get done

Examples of Operators:

  • PythonOperator, to run a Python callable
  • SqliteOperator, to execute a query on a SQLite DB
  • BashOperator, to execute bash commands
  • RedshiftToS3Transfer, to execute an UNLOAD command to Amazon s3 as a CSV with headers
  • GoogleCloudStorageToBigQueryOperator, to load files from Google Cloud Storage into BigQuery

After this short explanation we can introduce the Exercises: Airflow for training and predicting.