Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart issues found when upgrading 1.8 -> 2.1.0 #347

Open
JeppeTh opened this issue Jun 24, 2024 · 8 comments · May be fixed by #348
Open

Restart issues found when upgrading 1.8 -> 2.1.0 #347

JeppeTh opened this issue Jun 24, 2024 · 8 comments · May be fixed by #348

Comments

@JeppeTh
Copy link

JeppeTh commented Jun 24, 2024

In case process calling ranch:stop_listener is terminated in the middle of stop we end up in a limbo state.
I.e. here simulate that the stopping process is killed after calling supervisor:terminate_child

1> ranch:start_listener(my_server,ranch_tcp,#{},cowboy_tls,#{}).
{ok,<0.4006.0>}
2> supervisor:terminate_child(ranch_sup, {ranch_listener_sup, my_server}).
ok
3> ranch:stop_listener(my_server).
{error,not_found}
4> ranch:info(my_server).
** exception error: bad argument
in function ets:lookup_element/3
called as ets:lookup_element(ranch_server,{listener_sup,my_server},2)
*** argument 2: not a key that exists in the table
in call from ranch_server:get_listener_sup/1 (/local/gsnws/sgsn_mme/3pp/erlang/lib/ranch/src/ranch_server.erl, line 122)
in call from ranch:info/1 (/local/gsnws/sgsn_mme/3pp/erlang/lib/ranch/src/ranch.erl, line 443)
5> ranch:start_listener(my_server,ranch_tcp,#{},cowboy_tls,#{}).
{error,already_present}

Most likely introduced here:
9765f30
I.e. before this a 2nd call to ranch:stop_listener performed another full stop. That's not the case any more since ets tables seems cleaned -> stop is aborted by badarg

In another situation I managed to get into the same situation but where ranch:info/0 finds it - but not ranch:info/1. In the above case neiter ranch:info/0 finds its.

@JeppeTh
Copy link
Author

JeppeTh commented Jun 24, 2024

Probably the cleanup part should be separated into an try catch of its own. I.e. that data must be fetched before calling terminate_child. Then regardless of its outcome, always perform the legacy parts of stop_listener.

@juhlig juhlig linked a pull request Jun 25, 2024 that will close this issue
@juhlig
Copy link
Contributor

juhlig commented Jun 25, 2024

#348 should fix this.

However, in the discussion in the linked commit, you mentioned that you could still "see" the listener in the output from ranch:info/0. This should not be possible, since this depends on the contents of the ranch_server table, which should be cleaned as soon as the process of the listener exits. If you really can find it in the ranch:info/0 output, this suggests a corruption of the data in the table. If you could provide steps to reproduce this scenario, it would be immensely helpful, otherwise I fear that I can't do much about it 🤷‍♂️

@JeppeTh
Copy link
Author

JeppeTh commented Jun 25, 2024

That other case happened for my test tool - let's see if I get any time and luck reproducing that...

@juhlig
Copy link
Contributor

juhlig commented Jun 25, 2024

This may be due to timing then... ranch_server monitors the listener processes, and when it receives a 'DOWN' message clears the related entries from the table. If the message sent by way of ranch:info/1,2 reaches the ranch_server process before the 'DOWN' message does, you will "see" the listener still there.

Put differently, if a call to ranch:info/1,2 does show your listener, another subsequent call may or may not show it. However, if a call to ranch:info/1,2 does not show your listener, another subsequent call must also not show it. (That is, of course, unless there was a call to ranch:start_listener in between).

@JeppeTh
Copy link
Author

JeppeTh commented Jun 25, 2024

It wasn't that difficult to reproduce - here's a trace on ranch. Haven't analysed it.
trace.log

@JeppeTh
Copy link
Author

JeppeTh commented Jun 25, 2024

Where you can see e.g.

12:16:41,878: (me@mymachine,<0.1973.0>) return ranch:info/0 ->
#{{myserver,{10,0,4,48},5022} =>

12:17:03,534: (me@mymachine,<0.1980.0>) return ranch:stop_listener/1 ->
{error,not_found}

12:17:19,778: (me@mymachine,<0.1983.0>) return ranch:start_listener/5 ->
{error,{already_started,<0.1965.0>}}

@JeppeTh
Copy link
Author

JeppeTh commented Jun 25, 2024

But perhaps this is not related to the mentioned change. Or at least running the old stop_listener doesn't help. Still stuck in limbo

@JeppeTh
Copy link
Author

JeppeTh commented Jun 25, 2024

Sorry this side track was all wrong. I now realized the ranch Ref wasn't just the name it's {Name, Ip, Port} 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants