-
-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler defaults to 5 runs, so it goes into a CrashLoopBackoff when deployed #20
Comments
airflow is weird. The whole purpose of this setting is to let the scheduler kills itself periodically to reload DAGs. In kubernetes this does not have a huge impact since it will be restarted automatically, and the whole kill/restart cicle can take a while, but airflow does not do sub seconds precision. -1 means you can never update your DAG, 1 means scheduler kills itself at every task launch |
I feel like it would have less impact in Docker, but with Kube managing the pods it ends up putting the cluster in a not-happy state with backoffs because the exiting script looks like a crash. Thanks for the heads up on the motivation to exit after a few task runs - I'm going to modify the start shell script in my version of the Docker container. I think it makes sense to run the scheduler in a while loop but break if it returns a bad error code, so Kube can still manage those incidents as real crashes. |
feel free to submit a pull request. I do have my scheduler restarting regularly, I don't see problems except it takes a few minutes to power on (so delaying next dag start) |
The issue that I had with kubernetes is that it tracks the number of restarts, so if you run this application indefinitely you could see large reset numbers over a long period of time which would be a red flag to an administrator that runs "kubectl get pods" on the cluster, unless I am understanding it wrong. As a solution, maybe this pod could be run as a kubernetes cronjob or kubernetes job. Would this break the way the scheduler works?
|
@gsemet How/where did you change the config for the scheduler to restart automatically? I'm not seeing it in airflow.cfg. |
@gsemet when scheduler args n != -1 it will restart and then go to CrashLoopBackOff later. You can see it in helm chart |
kube-airflow/airflow.all.yaml
Line 194 in 01ae78a
This causes the file read loop to happen five times, then the scheduler exits. It seems like a strange default setup.
I'm a bit confused why this is set this way - shouldn't the scheduler be looping indefinitely? I'm also seeing the scheduler failing to queue up tasks, same as #19, and I wonder if this is the cause in that case, or something else.
The text was updated successfully, but these errors were encountered: