-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEV-11497] Socket connection management #8
base: master
Are you sure you want to change the base?
Conversation
I read through the changes. Before addressing the tiny details, I'd like to talk about the big picture. I agree with you that the WTFs/minute when reading the library's source code is a bigger number that it should be. It's supposed to closely follow the reference implementation, which at times, it does not do a very good job at. However, many of us are familiar with the twisted way the library works, as we've debugged and tried to understand it for the past few years as we've fixed issues. If we merge your PR, and replace a lot of old code with your new code, we'd be throwing a lot of that knowledge away. For the near future, this PR might simplify things on the surface level by splitting things up, but at a cost of replacing "the devil we know" with a new saint that we don't know that well yet.
This being the case, can we make very minimal (or zero?) changes to the existing code while achieving the same goal? For example:
As this PR currently stands, it feels a bit risky to merge it, especially when it does not fix anything concrete. We also don't have that many tests to begin with, so more minimal changes with a clear goal would be easier to manage. |
I think that while we might be familiar with the current code, it being hard to understand also makes us avoid changing it even if we suspect something is wrong with it - it is more pleasant to fix our own code instead. The other benefit of having it easier is that more people will be able to learn it, and detect errors. I understand that this relies on relative simplicity of the new solution, so I'm open to updating this PR to simplify it even more, but I disagree with the general sentiment of valuing the current knowledge over future use.
I don't think these can fully replace the code change themselves, and the result with applying these changes to existing code without any structural change can make the end result less readable.
I probably should add more context: the intention for this PR was to resolve the duplicate connection problem by changes to the socket layer. But as with other PRs that we merged to resolve this problem, I cannot guarantee that it's going to fix it. From tests on my devices I didn't notice this problem surfacing*, but I'll introduce more tests for this, including probable causes. I'm putting this PR into draft mode for the time being, I'll focus on some tests and additional changes to simplify this even further. * I saw an instance of multiple rapid connection/disconnection though. I'll investigate what's causing this [probably some calls to connect get queued too many times, when the only last one prevails]). |
Summary of chat:
|
…h#86) * add timeout to websocket connection and socket heartbeat * added timeout to WebSocket ready future * closing sink before assigning null to ws * cancel heartbeat when socket is disposed
Replaced the PhoenixSocket._isConnected field with "_isOpen", which also is in line with the Phoenix state event naming. Added some tests, and more comments. Will work on more tests tomorrow, but I think it's ready to review in detail now @roughike @miguelcmedeiros |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some inline comments around naming and also implementation.
@brian-superlist @miguelcmedeiros @roughike Updated structurally how the code is split in various classes, with main intention being simplicity, and preservation of current behavior of PhoenixScoket.connect(). What changed:
|
Hey all, I'm sorry to leave this comment right before I leave, but I'm not sure I fully support merging this PR. Right now, the network connection is pretty stable and recovers well. We have not had users writing in reports about seeing the offline indicator, nor are we sure we are still running into double connection issues. Finally, it looks like we are failing on a few different integration tests now. Therefore, if this PR does not solve a specific issue, while possibly causing some regressions, I'm not sure if it's worth it. Readability is in the eye of the beholder, and I personally think this is harder to reason about than the previous code. Miguel mentioned that I might have Stockholm syndrome 😅, and I actually sympathize with that description. However, to make such a large change to a library that is now quite stable, we should have a strong case for it: What specifically does it improve? I don't want to block the merging of this PR if the team feels it's the right direction, but I wanted to note that even after these latest changes I'm also hesitant about this changeset and the possibility that it might cause us new headaches with connections right after we solved so many of them. If we want to continue with the plan we have right now: Fix up the integration tests, merge early next week, and test, I can disagree and commit. However, I wanted to try to clarify that I'm still not fully sold on this set of changes. |
Just for the record - I echo Brian's comment and what I said 2 weeks ago. Feels like the risk-to-reward ratio is off with this one. I don't feel like it's the right direction, I don't think the changes simplify anything, and I think actually makes things more hard to follow for me. I don't want to merge this one. If you and @miguelcmedeiros feel strongly about merging this one, I will not block you, but I'll assume that you know what you're doing and know the risks. If you do decide to merge, at least the failing integration tests must be fixed or there has to be a very good reason why not to fix them. |
Let's merge not before the 1.14.0 build was sent to Testlio. We can then leave it in internal testing for 1 cycle before we release it to users. |
I can add additional tests over the ones that are here, and that I've added myself, to even further minimize the risk.
That's why I'd like to have it tested internally on daily before pushing it to production.
After a bit of digging I realized that the backend at |
…er is closed or disposed
…-state-cannot-reconnect-a-disposed-socket [part of DEV-11757] Makes failed heartbeat not attempt to reconnect after ConnectionManager is closed or disposed
Rewrites the
PhoenixSocket
class to simplify connection management and message processing.While this PR improves socket handling in the Phoenix library, it is not guaranteed to resolve any particular problem. Its intention is to primarily make the socket flow easier to understand and debug.
The major code change is that
PhoenixSocket
has been split into several classes, mainlySocketConnectionManager
and its supporting classes.The SocketConnectionManager is responsible for keeping a connection alive - it has facilities to keep track of current WebSocket connection state, reconnect connection on failures, and provides communication interface to
PhoenixSocket
. The upstream (toPhoenixSocket
class) interface consists of three callbacks supplied to its constructor:onMessage(String)
for receiving messages from the WebSocket connection,onError(Object, [StackTrace])
for communicating errors with either initialization or operation of the connection.onStateChange(WebSocketConnectionState)
which communicates current state of the connection (initialized, ready, closing, and closed).The downstream interface is the
addMessage(String)
method.In addition there are several smaller classes working with SocketConnectionManager:
_WebSocketConnection
which is a thin wrapper aroundWebSocketChannel
, managing its stream/sink interface._ConnectionCallbacks
which encapsulates the upstream callbacks, and performs several checks to prevent obsolete or invalid calls from propagating.SocketConnectionAttempt
which identifies connection attempt, and allows a bit of control over reconnection delay.ConnectionInitializationException
.The
PhoenixSocket
class is now mostly responsible for messaging and keeping track of the channels. What changes in some event handling is that this class no longer attempts to control the underlying WebSocket connection in response to the socket's events. Instead, the internals ofSocketConnectionManager
are responsible for controlling connections and acting on errors and closes to keep the connection running.This PR also removes
PhoenixRawSocket
, which is unused, and incompatible with the changes.Tests are pending, but I've been running it successfully as Mac and iOS app in the past day.