🎉 What if you could instantly sync DAG changes from Git to Airflow? Well now you can!
Airflow Git Sync provides automated DAG deployments from Git for Airflow environments. It syncs your pipeline code from a Git repository into the Airflow DAG folder.
Keeping DAGs directly in Airflow servers makes management challenging. Code changes require manual syncing to containers. There is no version control or history.
If you have ever worked with Airflow on Kubernetes, it gives you the ability to sync the DAGs with your repository (as an GitOps solution) using git-sync sidecar contanier. If you don't have Kubernetes, it is hard to keep the DAGs directory of Airflow (which is placed at /opt/airflow/dags/
) synced with the changes you applied to your DAGs and in some cases it is required to restart the Airflow service or container.
The project introduces git-sync
application alongside Airflow. This handles cloning your configured DAG Git repository and syncing contents over to Airflow's DAG directory.
The syncing is achieved via a lightweight Docker container that runs periodically using inotify wait to detect file changes. The container can be deployed using docker-compose alongside Airflow. Here is a bit of the docker-compose file:
airflow-webserver:
# Airflow container
airflow-scheduler:
# Airflow container
git-sync:
# Git-sync container
image: databurst/git-sync:latest
environment:
REPO_URL: <dags_git_repo_url>.git
#...other config
The git-sync container will keep DAGs in Airflow containers continually synced from files committed to the Git repository.
The end result is Airflow DAGs can be managed via Git best practices. Changes are automatically reflected in your pipeline deployment. No need for complex Kubernetes just to get basic Git sync!
Important Tip:
Before you can use the project, based on Airflow's documentation, you need to ensure that Airflow has the correct permissions for the required directories. To do this, execute the following commands in the directory where your docker-compose.yaml
file is located:
mkdir -p ./dags ./logs ./plugins ./config
echo -e "AIRFLOW_UID=$(id -u)" > .env
-
Generate an SSH Key: If you don't already have an SSH key, you can generate one using the following steps:
-
Open a Terminal: Open your terminal or command prompt.
-
Generate SSH Key: Run the following command to generate a new SSH key:
ssh-keygen -t <key-type> # example # ssh-keygen -t id_ed25519
Replace
<key-type>
with the desired key type (e.g.,ed25519
,rsa
). -
Follow Prompts: You'll be prompted to choose a location for your SSH key. Press Enter to accept the default location (usually
~/.ssh/id_<key-type>
) or specify a different one.
-
-
Adding SSH Key to Your Git Account: To use your SSH key with Git, you need to add your public key to your Git account. Here's how:
-
Go to your Git account settings on the web (e.g., GitHub, GitLab).
-
Navigate to "SSH and GPG keys" or a similar section.
-
Click "New SSH key" or equivalent.
-
Paste your public key into the provided field and give it a meaningful title.
-
-
Updating Docker Compose: To ensure that your SSH key is correctly mounted in the
git-sync
container, modify the relevant line in yourdocker-compose.yaml
file as follows:- ${GIT_SSH_KEY:-~/.ssh/<ssh_private_key_file_name>}:/root/.ssh/<ssh_private_key_file_name>
Using Airflow Git Sync is simple:
-
Clone the repository.
-
Configure git-sync via environment variables in
docker-compose.yaml
file:Variable Description Default Value REPO_URL
The URL of the Git repository to sync [email protected]:data-burst/airflow_git_sync.git
(required)SUBFOLDER_PATH
The repository sub-folder to sync. Leaving empty copies the entire repo N/A
(optional)GIT_BRANCH
The Git branch to sync main
(optional)DIRECTORY_NAME
The name of the directory to clone the repository into project
(optional)DESTINATION_PATH
The path to sync the repository to /app/sync
(optional)INTERVAL
The interval (in seconds) to sync the repository 10
(optional)GIT_PULL_REBASE
Determines the Git pull strategy. If set to true
, it configuresgit config pull.rebase
to userebase
during pulls. Iffalse
, it defaults tomerge
.false
(optional) -
In order to deploy the Airflow with the configured Git-sync, simply run the
docker compose up -d
command. -
Enjoy!
In this section you can find and resolve your common issues that faced to.
Internet Connection Issue
If you've seen the following error using docker logs -f <container-name>
command, the probable root cause may be is that you are connected to VPN!
getaddrinfo github.com: Try again ssh: Could not resolve hostname github.com: Try again fatal: Could not read from remote repository.
For more information, checkout this link.
We welcome contributions to this repository! If you’re interested in contributing, please take a look at our CONTIRIBUTION.md file for more information on how to get started. We look forward to collaborating with you!
This repository is licensed under the MIT License, which is a permissive open-source license that allows for reuse and modification of the code with few restrictions. You can find the full text of the license in this file.