Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler defaults to 5 runs, so it goes into a CrashLoopBackoff when deployed #20

Open
jdavidheiser opened this issue Dec 8, 2017 · 6 comments

Comments

@jdavidheiser
Copy link

jdavidheiser commented Dec 8, 2017

args: ["scheduler", "-n", "5"]

This causes the file read loop to happen five times, then the scheduler exits. It seems like a strange default setup.

I'm a bit confused why this is set this way - shouldn't the scheduler be looping indefinitely? I'm also seeing the scheduler failing to queue up tasks, same as #19, and I wonder if this is the cause in that case, or something else.

@gsemet
Copy link
Contributor

gsemet commented Dec 9, 2017

airflow is weird. The whole purpose of this setting is to let the scheduler kills itself periodically to reload DAGs. In kubernetes this does not have a huge impact since it will be restarted automatically, and the whole kill/restart cicle can take a while, but airflow does not do sub seconds precision.

-1 means you can never update your DAG, 1 means scheduler kills itself at every task launch

@jdavidheiser
Copy link
Author

I feel like it would have less impact in Docker, but with Kube managing the pods it ends up putting the cluster in a not-happy state with backoffs because the exiting script looks like a crash. Thanks for the heads up on the motivation to exit after a few task runs - I'm going to modify the start shell script in my version of the Docker container. I think it makes sense to run the scheduler in a while loop but break if it returns a bad error code, so Kube can still manage those incidents as real crashes.

@gsemet
Copy link
Contributor

gsemet commented Dec 11, 2017

feel free to submit a pull request. I do have my scheduler restarting regularly, I don't see problems except it takes a few minutes to power on (so delaying next dag start)

@ryan-riopelle
Copy link

ryan-riopelle commented Oct 31, 2018

The issue that I had with kubernetes is that it tracks the number of restarts, so if you run this application indefinitely you could see large reset numbers over a long period of time which would be a red flag to an administrator that runs "kubectl get pods" on the cluster, unless I am understanding it wrong.

As a solution, maybe this pod could be run as a kubernetes cronjob or kubernetes job.
Change in YAML would be similar to below but have not fully debugged yet.

Would this break the way the scheduler works?

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: scheduler
  labels:
    app: airflow
    tier: scheduler
spec:
  schedule: "*/2 * * * *" #every 5 minutes
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
          - name: scheduler
            image: <image-location>
            # volumes:
            #     - /localpath/to/dags:/usr/local/airflow/dags
            env:
            - name: AIRFLOW_HOME
              value: "/usr/local/airflow"
            args: ["scheduler", "-n", "5"]

@aditinabar
Copy link

aditinabar commented Sep 6, 2019

@gsemet How/where did you change the config for the scheduler to restart automatically? I'm not seeing it in airflow.cfg.

@Lord-Y
Copy link

Lord-Y commented Sep 30, 2019

@gsemet when scheduler args n != -1 it will restart and then go to CrashLoopBackOff later. You can see it in helm chart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants