Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/use gunicorn in dev mode #19

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

sergiofenoll
Copy link

@sergiofenoll sergiofenoll commented Nov 16, 2023

This PR changes the template so that the same mechanism (gunicorn) that is used for running in production is also used in development mode. This should ensure that the behaviour of services using this template is consistent between development mode and production mode.

Why

With the previous setup, in development mode the template would use Flask to serve the code, which meant that route handlers can take as long as you want, whereas in production mode there's a timeout for workers set by Gunicorn, so a route handler can't take more than a certain amount of time before it gets aborted.
This disparity is quite annoying because you might make something that works perfectly locally in dev-mode and then as soon as you deploy it you notice it breaks with errors you can't seem to reproduce locally.

How

Gunicorn provides a --reload option so that we don't lose the live code reload functionality of the template, but to get this working it was necessary to replace the worker class. It seems that the workers we used meinheld don't work well alongside gunicorn's reload mechanism: notice that in this issue the reporter is using meinheld workers. Meinheld itself also seems quite unmaintained, as the last commit for it was over 3 years ago and its official website has been squatted, so it might make sense to replace the worker class regardless.

Warning

I know next to nothing about Gunicorn's architecture and the working of WSGI servers. I'm not really sure if the suggested worker is "better" than what we're currently using. So thorough testing with existing services is needed before merging this.

Nice extras

A lot of our services make use of apscheduler to build cronjobs. Due to how Flask's debug mode interacted with this library, in development mode usage of this library would result in 2 schedulers being instantiated. Every cron job would run twice. Although this didn't happen in production mode, it could be quite annoying/confusing when developing using this template. Switching to gunicorn for development mode has inadvertently resolved this. For some extra context see this StackOverflow post.

Encountered issues

Gunicorn's --reload mechanism seems to be broken for certain errors: benoitc/gunicorn#2244
This would mean that the reloader stops working if there's e.g. a syntax error in your file while developing a service, which is quite likely to happen. To be determined if this gets fixed if we use a newer version of gunicorn (the currently used version is 2 years old). Adding inotify enables that mechanism for the reloader, in which case it can survive the mentioned errors.
By changing the service-code import mechanism we can circumvent this issue. The reloader tracks the files it needs to listen to based on imports. Because we used to set the import inside a try-except block, the reloader wouldn't correctly pick up file changes if importing the file throws. By moving this import outside of the try-except block, we keep the reloader happy and also "fix" #12: a missing dependency or a syntax error in service-code will not silently print a single log line and set up a web server with no endpoints, instead it crashes the service.

The meinheld worker seems to be unmaintained and it doesn't work with
gunicorn's reload mechanism.
@sergiofenoll sergiofenoll force-pushed the feature/use-gunicorn-in-dev-mode branch from c1bc262 to 5981fc6 Compare November 16, 2023 22:24
Having inotify installed enables gunicorn's reload mechanism to survive
through syntax errors and such in the service code.
Using gunicorn's reload mechanism, importing the code in a try-except
meant that the reloader would not pick up changes if the initial code
import threw an error. The reloader checks which files it needs to
listen to notifications for based on its imports. It seems that the
combination of a throwing import inside a try block makes it so the
reloader doesn't register the service entrypoint as a file it needs to
listen to.
We no longer use Flask in dev mode, so this if block is never entered.
@MikiDi
Copy link

MikiDi commented Nov 17, 2023

I support this, since it only seems to make things better! The Gunicorn/Meinheld setup was introduced here, but was no more than a choice for a WSGI server rather than a specific commitment to Meinheld workers. Using this PR image in some of our frequently used dev setups will be the best way to test this further if we want to build more confidence before merging.

@nvdk
Copy link
Member

nvdk commented Dec 11, 2023

@sergiofenoll (or someone with access to this repo): could you set up a build? that will make it easier to test 👼

@sergiofenoll
Copy link
Author

@nvdk I don't think I can enable the build under the semtech namespace (I think someone with write rights for the organization/repo would have to recreate the PR and use that to push).

I have enabled the Woodpecker on my fork though, you should be able to use this image:
sergiofenoll/mu-python-template:feature-use-gunicorn-in-dev-mode

@sergiofenoll
Copy link
Author

This is something I've noticed while using this locally: while SyntaxErrors get handled properly by the reloading mechanism, but other errors like NameErrors (which can occur from something as simple as trying to use a variable that hasn't been defined) cause the whole service to crash. See: benoitc/gunicorn#2746

We could circumvent this by setting restart: always in the docker compose file when developing a Python template, but that's less than ideal. It might be worth it to look into adding an external reloader (like the JavaScript template has with nodemon) or implementing something using watchman or something like that.

@madnificent
Copy link
Member

We have pulled this in and made a feature build with the tag feature-use-gunicorn-in-dev-mode and intend to merge master in there as we merge more things into master. This should allow for some quicker tests.

@madnificent
Copy link
Member

We could not easily make the service itself crash, even with using variables which don't exist (obviously, the request then errors but there's a nice stack trace). This is likely due to upgrades in gunicorn itself. The base image gets updated regularly and the base image we used was pushed two days ago. 👍

@sergiofenoll
Copy link
Author

@madnificent here's a screencast showing how the dev server can fully crash without automatically rebooting:

Screencast.from.2024-10-09.12-13-51.webm

If the screencast isn't clear I can explain further in a call if needed.

The example is a bit convoluted, most people won't be typing my_var_that_does_not_exist_and_will_cause_the_server_to_crash into web.py, but it shows that it's simple enough to get it to crash. If you're in the middle of writing a line of code and you (auto)save your file, you might have the service completely crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants