Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposing an easy method for graceful application shutdown #198

Open
drone115b opened this issue Jul 18, 2018 · 16 comments
Open

Proposing an easy method for graceful application shutdown #198

drone115b opened this issue Jul 18, 2018 · 16 comments

Comments

@drone115b
Copy link

I think there's an appetite for waitress to be able to shutdown applications gracefully. I use waitress independently of pylons for a wide variety of micro-services.

On line 263 of server.py, SystemExit and KeyboardInterrupt exceptions are caught to shut down the task dispatcher.

I am not very fluent with the code, but if there was a "raise" call inserted after line 264, then it seems that higher-level exception handlers could be used to catch the same exceptions and shut down the application object gracefully.

Is there an argument against this? Have I misunderstood the code?

I think the pros are: 1-line (1-word!) solution, highly backwardly compatible, seems like it would address graceful application shutdown robustly.

@drone115b
Copy link
Author

There are a couple exception handlers that would need to be modified, but before I do it and send a PR I would love to have feedback from someone deeper in the design of the thing.

@digitalresistor
Copy link
Member

Your application (WSGI app) is running inside a bunch of threads, and is not the one that sees the KeyboardException that is caught in server.py.

Your application won't ever receive that exception or be able to handle them. Since the application object is a WSGI application, there is no way to notify it that shutdown is in progress, and waitress doesn't expose any handlers that would allow custom code to be executed.

This means there is no way to signal your application it is going away (and even if there were, there is no good way to signal that per thread, which may be deep inside your WSGI application).


Now if you are talking about graceful shutdown as "continue servicing existing requests, but don't accept new ones" similar open PR's exist:

#48
#56

Neither of which will likely apply to the current waitress (since there was some changes related to how it creates listen sockets), but I would be interested in seeing how to make graceful shutdown happen that way.

@drone115b
Copy link
Author

drone115b commented Jul 21, 2018

I think we misunderstand each other. In my deployment, it seems that waitress itself is shutting down satisfactorily -- at least I see no evidence of lingering effects when I send it a kill signal. (CentOS 7.3, python 2.7.15) The tickets you cited seem focused on waitress itself.

I am specifically talking about shutting down the WSGI application that waitress services. My workaround for the moment is to apply shutdown operations after the waitress.serve() call, but I would prefer to wrap the waitress.serve() call in an exception handler and to be able to catch KeyboardInterrupt and SystemExit (e.t al) myself to have finer control over why I'm actually shutting down the application. Waitress seems to be receiving the signals, I'm not sure why you're confident that raising them at the end of waitress' handler would not allow them to be successfully processed at the outer scope. You're more in the weeds of the implementation, so I trust your judgement - it's just not immediately obvious.

In any case, I may look at #56 in a few weeks. I can't promise anything, but I appreciate that this is a volunteer effort.

Thanks.

@digitalresistor
Copy link
Member

#56 wouldn't help you either, since that is related to how long waitress will wait before it shuts down connections hard.

@viktordick
Copy link

@drone115b What do you expect the application to do if the server intends to shut down? Since waitress is usually the outermost layer of the WSGI stack, it can not raise exceptions to the application, which is probably the lowermost layer.

It might be possible to create a signal similar to the one implemented in #310 so the application is able to check if the server intends to shut down and can somehow react to it. But I don't know how helpful that would be. Either the application decides to finish the request and write the answer, in which case it will probably not have a possibility to become faster if it knows that the server is shutting down (if it can be faster, why isn't it always faster?). Or it decides to abort execution - which it will also do if any graceful shutdown period runs out and waitress is shutting down.

The only case I could imagine is if the application knows that the next step in the execution will take a long time and decides that it might as well not start if the server is anyhow going to shut down. Is this a use case you have?

@bertjwregeer I would like to start discussing possible implementations regarding a graceful shutdown that a) serves the remaining requests without accepting new ones and b) can be triggered both from receiving a signal and from an exception from within the application. In your opinion, should I open a new issue or continue the discussion in this issue, which seems to have been started with a different goal in mind?

@digitalresistor
Copy link
Member

@viktordick This is what needs to be done to add graceful shutdown which allows waitress to finish serving any remaining connections: #269 (comment)

If you want to work on that, by all means. I feel uneasy about anything that implements the second part of your b though, I am not a huge fan of an app being able to send a signal back to shut stuff down.

@mmerickel
Copy link
Member

I agree with @bertjwregeer that signaling the application should not be part of this initial work. It's a valid concern but initially the focus should be on:

  • stop accepting new connections
  • stop reading new requests from active channels
  • finish processing received requests
  • closing the connections
  • destroying the event loop

This behavior should be triggered from an api on the server object like server.graceful_shutdown() which can be used for integrating via code or from a SIGTERM handler. Finally I'd expect an adjustment in waitress that specifies whether or not to register the handler automatically.

As far as a timeout on the graceful shutdown, there's not much to say here because no matter what, waitress is thread-based and cannot kill active threads. We should rely on the process manager to follow standard graceful shutdown logic which is to send a SIGTERM, wait X seconds, then send SIGKILL. Graceful is graceful, and waitress cannot stop connections that don't want to stop because an app is processing them. Once we're at this point, we can look at how to help the app shutdown if it wants to care.

@viktordick
Copy link

All right, I agree that we can drop idea b). If required, the app can simply send SIGTERM to its own process.

When implementing a graceful shutdown, I usually use the pattern that a second SIGTERM or SIGINT terminates immediately. That way, if started directly from the command line for debugging purposes and a request is not getting finished, hitting Ctrl+C will say "Terminating" and if it still takes too long, I can hit a second Ctrl+C.

I do not have any experience of signal handling on systems other than Linux. Are other systems even supported?

I hope I will find some time in the next week to work on this.

@mmerickel
Copy link
Member

@viktordick I agree with your proposed behavior. The idea being that the graceful signal handler is simply unregistered after the signal is received. The second signal triggers the default behavior in the process.

@mmerickel
Copy link
Member

I don't know how realistic this is, but if there's a way to turn this feature into a couple PRs that'd be really great because I think it's going to involve touching quite a lot of waitress and become difficult to review. Anyway just something to think about.

As a friendly warning, I think that this shutdown / cleanup stuff is directly related to quite a lot of open sockets left around in the test suite as it runs as those tests do not cleanup the running event loop. This issue should be solvable as part of this work, but maybe not in the same PR necessarily. Just a thing to watch out for.

@viktordick
Copy link

I am afraid that I will not be able to work directly on this in the near future, after all. We decided to first use a different method - unregistering the backend from the load balancer (in our case HAproxy), waiting until the backend has no requests left and then restarting the waitress process before again activating the backend. With this strategy, I am more confident that I know how to implement each step on the way (and it will also work for older Zope2 installations).

Hopefully, I will be able to still return to the problem in waitress itself, but I am afraid it will not be sometime soon.

@mmerickel
Copy link
Member

@viktordick thank you for keeping us in the loop, thanks for all your efforts!

@dbakit
Copy link

dbakit commented Jan 14, 2022

use create_server() instead of serve(), and handle signal by yourself, will be more graceful. For example:

from waitress.server import create_server

if __name__ == '__main__':
    #
    # some deleted code.....
    #
    
    # signal handler, to do something before shutdown service
    def handle_sig(sig, frame):
        logging.warning(f"Got signal {sig}, now close worker...")
        worker.close()
        server.close()

    for sig in (signal.SIGINT, signal.SIGTERM, signal.SIGQUIT, signal.SIGHUP):
        signal.signal(sig, handle_sig)

    server = create_server(app, host="0.0.0.0", port=metric_port)
    server.run()

@Rudd-O
Copy link

Rudd-O commented Sep 24, 2022

There should also be a distinction between INT and TERM such that TERM lets the application stop listening on the socket for new connections but finish serving existing open sockets, while INT tells the app "just drop everything on the floor". Ideally the application also allows the orchestrator to know when it is finally quiescent (probably through an HTTP mechanism).

Waitress-based applications, containerized for use on Kubernetes, need this behavior. Otherwise killing the app is just going to cause broken active connections.

@mmerickel
Copy link
Member

For graceful shutdown the focus is really on handling SIGTERM as that's the standard signal used for that purpose. There are no required semantics around SIGINT. If I were implementing this feature I would be treating both SIGINT and SIGTERM as a graceful shutdown which is consistent with other systems I have used supporting the feature.

@Rudd-O
Copy link

Rudd-O commented Oct 1, 2022

Yeah I wouldn't mind if INT is not "properly handled", so long as SIGTERM is graceful shutdown with all connections finishing up properly. I require this for my automation, and Kubernetes folks require it too.

NexediGitlab pushed a commit to Nexedi/erp5 that referenced this issue Mar 2, 2023
NexediGitlab pushed a commit to Nexedi/erp5 that referenced this issue Mar 2, 2023
NexediGitlab pushed a commit to Nexedi/erp5 that referenced this issue Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants