Restart issues found when upgrading 1.8 -> 2.1.0 #347

JeppeTh · 2024-06-24T17:06:19Z

In case process calling ranch:stop_listener is terminated in the middle of stop we end up in a limbo state.
I.e. here simulate that the stopping process is killed after calling supervisor:terminate_child

1> ranch:start_listener(my_server,ranch_tcp,#{},cowboy_tls,#{}).
{ok,<0.4006.0>}
2> supervisor:terminate_child(ranch_sup, {ranch_listener_sup, my_server}).
ok
3> ranch:stop_listener(my_server).
{error,not_found}
4> ranch:info(my_server).
** exception error: bad argument
in function ets:lookup_element/3
called as ets:lookup_element(ranch_server,{listener_sup,my_server},2)
*** argument 2: not a key that exists in the table
in call from ranch_server:get_listener_sup/1 (/local/gsnws/sgsn_mme/3pp/erlang/lib/ranch/src/ranch_server.erl, line 122)
in call from ranch:info/1 (/local/gsnws/sgsn_mme/3pp/erlang/lib/ranch/src/ranch.erl, line 443)
5> ranch:start_listener(my_server,ranch_tcp,#{},cowboy_tls,#{}).
{error,already_present}

Most likely introduced here:
9765f30
I.e. before this a 2nd call to ranch:stop_listener performed another full stop. That's not the case any more since ets tables seems cleaned -> stop is aborted by badarg

In another situation I managed to get into the same situation but where ranch:info/0 finds it - but not ranch:info/1. In the above case neiter ranch:info/0 finds its.

The text was updated successfully, but these errors were encountered:

JeppeTh · 2024-06-24T21:57:43Z

Probably the cleanup part should be separated into an try catch of its own. I.e. that data must be fetched before calling terminate_child. Then regardless of its outcome, always perform the legacy parts of stop_listener.

juhlig · 2024-06-25T08:56:10Z

#348 should fix this.

However, in the discussion in the linked commit, you mentioned that you could still "see" the listener in the output from ranch:info/0. This should not be possible, since this depends on the contents of the ranch_server table, which should be cleaned as soon as the process of the listener exits. If you really can find it in the ranch:info/0 output, this suggests a corruption of the data in the table. If you could provide steps to reproduce this scenario, it would be immensely helpful, otherwise I fear that I can't do much about it 🤷‍♂️

JeppeTh · 2024-06-25T09:01:12Z

That other case happened for my test tool - let's see if I get any time and luck reproducing that...

juhlig · 2024-06-25T09:31:26Z

This may be due to timing then... ranch_server monitors the listener processes, and when it receives a 'DOWN' message clears the related entries from the table. If the message sent by way of ranch:info/1,2 reaches the ranch_server process before the 'DOWN' message does, you will "see" the listener still there.

Put differently, if a call to ranch:info/1,2 does show your listener, another subsequent call may or may not show it. However, if a call to ranch:info/1,2 does not show your listener, another subsequent call must also not show it. (That is, of course, unless there was a call to ranch:start_listener in between).

JeppeTh · 2024-06-25T10:21:42Z

It wasn't that difficult to reproduce - here's a trace on ranch. Haven't analysed it.
trace.log

JeppeTh · 2024-06-25T10:23:53Z

Where you can see e.g.

12:16:41,878: (me@mymachine,<0.1973.0>) return ranch:info/0 ->
#{{myserver,{10,0,4,48},5022} =>

12:17:03,534: (me@mymachine,<0.1980.0>) return ranch:stop_listener/1 ->
{error,not_found}

12:17:19,778: (me@mymachine,<0.1983.0>) return ranch:start_listener/5 ->
{error,{already_started,<0.1965.0>}}

JeppeTh · 2024-06-25T10:33:33Z

But perhaps this is not related to the mentioned change. Or at least running the old stop_listener doesn't help. Still stuck in limbo

JeppeTh · 2024-06-25T10:44:39Z

Sorry this side track was all wrong. I now realized the ranch Ref wasn't just the name it's {Name, Ip, Port} 🙈

JeppeTh referenced this issue Jun 24, 2024

Delete local socket file when a listener closes

9765f30

juhlig linked a pull request Jun 25, 2024 that will close this issue

Fix halfway-stopping of listeners #348

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restart issues found when upgrading 1.8 -> 2.1.0 #347

Restart issues found when upgrading 1.8 -> 2.1.0 #347

JeppeTh commented Jun 24, 2024 •

edited

Loading

JeppeTh commented Jun 24, 2024 •

edited

Loading

juhlig commented Jun 25, 2024

JeppeTh commented Jun 25, 2024

juhlig commented Jun 25, 2024

JeppeTh commented Jun 25, 2024

JeppeTh commented Jun 25, 2024

JeppeTh commented Jun 25, 2024

JeppeTh commented Jun 25, 2024 •

edited

Loading

Restart issues found when upgrading 1.8 -> 2.1.0 #347

Restart issues found when upgrading 1.8 -> 2.1.0 #347

Comments

JeppeTh commented Jun 24, 2024 • edited Loading

JeppeTh commented Jun 24, 2024 • edited Loading

juhlig commented Jun 25, 2024

JeppeTh commented Jun 25, 2024

juhlig commented Jun 25, 2024

JeppeTh commented Jun 25, 2024

JeppeTh commented Jun 25, 2024

JeppeTh commented Jun 25, 2024

JeppeTh commented Jun 25, 2024 • edited Loading

JeppeTh commented Jun 24, 2024 •

edited

Loading

JeppeTh commented Jun 24, 2024 •

edited

Loading

JeppeTh commented Jun 25, 2024 •

edited

Loading