Fix halfway-stopping of listeners #348

juhlig · 2024-06-25T08:51:59Z

Fixes #347.

The changes in this PR actually do two things:

by executing the stopping procedure in a separate process, a crash of the process calling ranch:stop_listener at an unfortunate time (namely, between the supervisor:terminate_child and supervisor:delete_child calls involved) should no longer prevent the stopping procedure from going to completion
by separating the fetching of the transport and transport options from the actual stopping, fully stopping a for some reason terminated by not deleted listener should complete

There are no tests yet (and for the former case, I don't see how we could reliably create the scenario of the calling process crashing at just the right time). I want to hear what you think about this approach first before investing more time.

* if the process calling ranch:stop_listener crashes before finishing, the stopping is still exetuted completely * if a listener is terminated but not deleted, calling ranch:stop_listener removes the remnant

zuiderkwast · 2024-06-25T09:03:08Z

Hi Jan!

In #347, @JeppeTh outlined a test case as an Erlang shell session. It involves supervisor:terminate_child to simulate the problem and put ranch in the problematic state.

juhlig · 2024-06-25T09:13:55Z

Hi Jan!

Hi Viktor, long time no see 😅

In #347, @JeppeTh outlined a test case as an Erlang shell session. It involves supervisor:terminate_child to simulate the problem and put ranch in the problematic state.

Yes, that would be a blueprint for a test case for my second point, when a listener already is in the halfway-stopped state.

The hard-to-test part revolves around my first point, which should make it even impossible to put a listener in that state by "normal means" and crash at unfortunate times =^^=

juhlig · 2024-06-27T14:59:45Z

@essen? WDYT of the approach I used and the PR in general?

essen · 2024-06-28T11:01:12Z

Why isn't this done in ranch_server instead?

juhlig · 2024-07-02T15:39:52Z

Why isn't this done in ranch_server instead?

Sure, could be done, not a bad idea :)
It would be a first, though: up to now ranch_server only does some bookkeeping on the listeners, ie it doesn't really do anything.

I'll make a draft to show to you, maybe tomorrow.

juhlig · 2024-07-02T16:29:21Z

I'll make a draft to show to you, maybe tomorrow.

... or maybe today =^^= WDYT?

essen · 2024-07-02T17:20:52Z

I would not move the stop logic to ranch_server but would instead add a way for it to execute a callback and return whatever the result is. That way we don't need to worry about the details in ranch_server and it can be reused in the future if necessary. But mostly to keep the stop code all in one place in ranch really.

Something like gen_server:call({rpc, M, F, A}). To be honest I wonder if erpc or something else can be used instead.

juhlig · 2024-07-03T09:04:39Z

Ah, ok, so you mean leave the stopping code in the ranch module, but run it in the ranch_server process, instead of in a process spawned from the process calling stop_listener, correct?

The way you propose would enable anyone to run any code in the ranch_server process, not sure if that is a good thing, it gives me an uneasy feeling though 🤷‍♂️
At the least, it would enable crashing the ranch_server process (like {rpc, erlang, exit, [foo]}). We could wrap the apply in a try...catch to defend against that.
The given MFArgs could also stall ranch_server (like {rpc, timer, sleep, [infinity]}). We could run the given MFArgs in a spawned process to defend against that, but then we could just as well spawn from the process calling ranch:stop_listener?
(Back in the good old days of IRC, I remember you saying "if users want to shoot themselves in the foot we let them" or sth like that, so I'm just pointing it out.)

erpc, by which I think you mean specifically erpc:call, hm. What it in a nutshell does is a fancy spawn, that would make it a viable alternative to the ranch_server rpc thing you proposed.
On closer inspection however, erpc:call takes a shortcut when called with the local node as first argument, in that case it does an apply instead, running the given MFArgs in the context of the calling process, which leaves us with the same problem as we have now: if the calling process exits at an awkward point (ie, between the applied MFArgs calling terminate_child and delete_child), we end up in the same scenario of a halfway-stopped listener.
(I think this behavior can actually be called a bug, I'll probably create an issue with OTP about it later.)

essen · 2024-07-03T09:21:16Z

Ah, ok, so you mean leave the stopping code in the ranch module, but run it in the ranch_server process, instead of in a process spawned from the process calling stop_listener, correct?

Yes.

The way you propose would enable anyone to run any code in the ranch_server process, not sure if that is a good thing, it gives me an uneasy feeling though 🤷‍♂️

We can restrict the M and F to ranch and do_stop_listener or whatever the name is, for now.

zuiderkwast · 2024-07-03T12:32:05Z

@essen What's the point of this extra abstraction? To me it looks like over-engineering.

Why not just do the simple protection (or workaround if consider it an OTP bug) for the fact that supervisor:terminate_child and supervisor:delete_child can't be called together as an atomic unit?

essen · 2024-07-03T15:47:19Z

We need to do it in a process that won't be the caller, it can either be a random process that we spawn for this specifically, or ranch_server which already exists. Better it being the latter as it's easy to find, trace, debug and whatever else, and we don't have to reimplement a call mechanism + monitors etc. like we would for spawn. It's not fire and forget, we have to wait for the listener to be stopped before returning.

Not sure which part is over engineering.

essen · 2024-07-03T16:10:28Z

If erpc applies its optimisation only for infinity timeout then it could be used then? Maybe have a large timeout, larger than the calls stop_listener do.

zuiderkwast · 2024-07-03T19:19:21Z

Thanks, makes sense.

src/ranch.erl

essen · 2024-07-04T14:33:35Z

Looks good, thanks! I'm off for the next two weeks but I'll look into merging and releasing a new version when I get back.

juhlig · 2024-07-04T14:44:24Z

Looks good, thanks! I'm off for the next two weeks but I'll look into merging and releasing a new version when I get back.

Have a nice vacation then 👋

juhlig · 2024-10-21T08:42:50Z

@essen what about this PR? 3+ months have passed 😅

Fix halfway-stopping of listeners

7a3aa3b

* if the process calling ranch:stop_listener crashes before finishing, the stopping is still exetuted completely * if a listener is terminated but not deleted, calling ranch:stop_listener removes the remnant

juhlig mentioned this pull request Jun 25, 2024

Restart issues found when upgrading 1.8 -> 2.1.0 #347

Open

juhlig force-pushed the fix_half-stopped_listeners branch from 1c3b4f7 to de6d4b6 Compare July 4, 2024 07:45

essen reviewed Jul 4, 2024

View reviewed changes

src/ranch.erl Show resolved Hide resolved

essen reviewed Jul 4, 2024

View reviewed changes

src/ranch.erl Outdated Show resolved Hide resolved

Use erpc:call instead of custom call mechanism

a2c959e

juhlig force-pushed the fix_half-stopped_listeners branch from de6d4b6 to a2c959e Compare July 4, 2024 08:35

Remove explicit cleanup from stopping listeners

d9a9521

essen reviewed Jul 4, 2024

View reviewed changes

src/ranch.erl Show resolved Hide resolved

Augment explanation for the local use of erpc:call

b2bbc49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix halfway-stopping of listeners #348

Fix halfway-stopping of listeners #348

juhlig commented Jun 25, 2024

zuiderkwast commented Jun 25, 2024

juhlig commented Jun 25, 2024 •

edited

Loading

juhlig commented Jun 27, 2024

essen commented Jun 28, 2024

juhlig commented Jul 2, 2024

juhlig commented Jul 2, 2024

essen commented Jul 2, 2024

juhlig commented Jul 3, 2024

essen commented Jul 3, 2024

zuiderkwast commented Jul 3, 2024

essen commented Jul 3, 2024

essen commented Jul 3, 2024

zuiderkwast commented Jul 3, 2024

essen commented Jul 4, 2024

juhlig commented Jul 4, 2024

juhlig commented Oct 21, 2024

Fix halfway-stopping of listeners #348

Are you sure you want to change the base?

Fix halfway-stopping of listeners #348

Conversation

juhlig commented Jun 25, 2024

zuiderkwast commented Jun 25, 2024

juhlig commented Jun 25, 2024 • edited Loading

juhlig commented Jun 27, 2024

essen commented Jun 28, 2024

juhlig commented Jul 2, 2024

juhlig commented Jul 2, 2024

essen commented Jul 2, 2024

juhlig commented Jul 3, 2024

essen commented Jul 3, 2024

zuiderkwast commented Jul 3, 2024

essen commented Jul 3, 2024

essen commented Jul 3, 2024

zuiderkwast commented Jul 3, 2024

essen commented Jul 4, 2024

juhlig commented Jul 4, 2024

juhlig commented Oct 21, 2024

juhlig commented Jun 25, 2024 •

edited

Loading