-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Implement Graceful Shutdown in Native worker #23517
base: master
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this code. @anandamideShakyan : Please can you add a unit test (maybe in ServerOperationsTest or a new unit test file) for this functionality.
@aditi-pandit Sure, I will add the unit test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anandamideShakyan Prestoserver handles graceful shutdown mechanism with SIGTERM, which is a standard way to shut down any process. Here is how the signal handler is registered (https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/PrestoServer.cpp#L152)
Also in the stop method], server goes in graceful shotdown mode
As a result, no explicit endpoint is required to ask the server to shut down. Just sending SIGTERM like any other process works as expected. In Java, perhaps these signal handling is not possible, hence this end point concept came into development. Native worker even have additional signal handling mechnism to catch debugging information in case of SEGFault (https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/PrestoMain.cpp#L25C27-L25C47)
CC: @spershin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this.
Please refer to this issue : #23299 |
@anandamideShakyan Commented in the issue as well |
ff8a9f2
to
f787527
Compare
a6bbcb4
to
c1d9c3d
Compare
@@ -1404,6 +1413,28 @@ void PrestoServer::reportNodeStatus(proxygen::ResponseHandler* downstream) { | |||
http::sendOkResponse(downstream, json(fetchNodeStatus())); | |||
} | |||
|
|||
void PrestoServer::handleGracefulShutdown(const std::vector<std::unique_ptr<folly::IOBuf>>& body, proxygen::ResponseHandler* downstream){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format check is failing. Please fix this (see the Makefile having a format-fix target).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -1404,6 +1413,28 @@ void PrestoServer::reportNodeStatus(proxygen::ResponseHandler* downstream) { | |||
http::sendOkResponse(downstream, json(fetchNodeStatus())); | |||
} | |||
|
|||
void PrestoServer::handleGracefulShutdown(const std::vector<std::unique_ptr<folly::IOBuf>>& body, proxygen::ResponseHandler* downstream){ | |||
if (body.size()==1 && body[0]->moveToFbString() == "\"SHUTTING_DOWN\"") { | |||
// Print message and initiate shutdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need all these comments as they explains what happens which is apparent from the code. We use the Velox comment guidelines: https://github.com/facebookincubator/velox/blob/main/CONTRIBUTING.md#coding-best-practices (see "Code Comments" section).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
c1d9c3d
to
1f9ab83
Compare
Please add a unit test as Aditi requested above. |
9502bfe
to
6325ff8
Compare
@majetideepak I have added the unit test GracefulShutdownTest.cpp. |
b9a39ef
to
4186e82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anandamideShakyan can you check if can use PrestoServerOperations.h?
See the test presto_cpp/main/tests/ServerOperationTest.cpp
.
I see that Aditi referenced this as well.
@majetideepak I had to add GracefulShutdownTest.h as a separate file because if we are running the presto server after some test cases we get error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some test cases we get error The memory manager has already been set
This should not happen. Can you share a reproducer?
Let's extend class PrestoServerOperations
if you need to launch presto server in a separate thread.
// Give coordinator some time to receive our new node state and stop sending | ||
// any tasks. | ||
std::this_thread::sleep_for(std::chrono::seconds(shutdownOnsetSec)); | ||
std::thread shutdownThread([this, shutdownOnsetSec]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the motivation for a separate shutdown thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were not able to read the server status if we didn't schedule the shutdown in a separate thread because before we could read the status, server was shutting down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to read this status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need it to compare it against the status codes in the test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again not ideal to make changes only for testing.
@@ -157,7 +158,9 @@ PrestoServer::PrestoServer(const std::string& configDirectoryPath) | |||
|
|||
PrestoServer::~PrestoServer() {} | |||
|
|||
void PrestoServer::run() { | |||
void PrestoServer::run( | |||
std::function<void(proxygen::HTTPServer*)> onSuccess, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these arguments used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use it inside PrestoServerWrapper.cpp to get the socket address of the server that we need while running the test suite(see SetUpTestSuite()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not ideal to make API changes just for testing. Let's figure out another way to test.
@majetideepak
While working on the issue I discussed the errors that I was getting with @nmahadevuni, he suggested me to make gracefulshutdown as a separate file and a separate executable. This solved the issue. Aditi also said that I can use a new file. |
@majetideepak , I can add the test suite as a mock test. But it still won't test the whole client->RESTEndpoint->response. |
@anandamideShakyan We already have the setup for such e2e in the Java tests. Can we add a test there? |
1acfb72
to
32f1458
Compare
…e SHUTING_DOWN state for graceful shutdown
70e697a
to
2879ca5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anandamideShakyan This looks great! I have 2 minor comments.
folly::trimWhitespace(body[0]->moveToFbString()) == "\"SHUTTING_DOWN\"") { | ||
LOG(INFO) << "Shutdown requested"; | ||
if (nodeState() == NodeState::kActive) { | ||
std::thread([this]() { this->stop(); }).detach(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we invoke stop in a separate thread? Please add a comment.
return PrestoNativeQueryRunnerUtils.createNativeQueryRunner(true); | ||
} | ||
|
||
private String getServerUri(QueryRunner queryRunner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need a function if it is used only once?
Description
Added PUT method in /v1/info/state for transitioning the server to the SHUTTING_DOWN state for graceful shutdown
Motivation and Context
The Prestissimo project currently lacks support for transitioning the server to the SHUTTING_DOWN state via the PUT method in the /v1/info/state endpoint. Implementing this feature allows for a graceful shutdown of the server, ensuring that ongoing processes are completed without abrupt termination.
Impact
This implementation introduces a new feature that enables the server to handle a graceful shutdown when the SHUTTING_DOWN state is triggered via a PUT request. By supporting this functionality, the server can now properly manage ongoing requests before shutting down, reducing the risk of data loss or inconsistency.
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: