[DEV-11497] Socket connection management #8

kpsroka · 2024-08-07T19:10:08Z

Rewrites the PhoenixSocket class to simplify connection management and message processing.

While this PR improves socket handling in the Phoenix library, it is not guaranteed to resolve any particular problem. Its intention is to primarily make the socket flow easier to understand and debug.

The major code change is that PhoenixSocket has been split into several classes, mainly SocketConnectionManager and its supporting classes.

The SocketConnectionManager is responsible for keeping a connection alive - it has facilities to keep track of current WebSocket connection state, reconnect connection on failures, and provides communication interface to PhoenixSocket. The upstream (to PhoenixSocket class) interface consists of three callbacks supplied to its constructor:

onMessage(String) for receiving messages from the WebSocket connection,
onError(Object, [StackTrace]) for communicating errors with either initialization or operation of the connection.
onStateChange(WebSocketConnectionState) which communicates current state of the connection (initialized, ready, closing, and closed).

The downstream interface is the addMessage(String) method.

In addition there are several smaller classes working with SocketConnectionManager:

_WebSocketConnection which is a thin wrapper around WebSocketChannel, managing its stream/sink interface.
_ConnectionCallbacks which encapsulates the upstream callbacks, and performs several checks to prevent obsolete or invalid calls from propagating.
SocketConnectionAttempt which identifies connection attempt, and allows a bit of control over reconnection delay.
ConnectionInitializationException.

The PhoenixSocket class is now mostly responsible for messaging and keeping track of the channels. What changes in some event handling is that this class no longer attempts to control the underlying WebSocket connection in response to the socket's events. Instead, the internals of SocketConnectionManager are responsible for controlling connections and acting on errors and closes to keep the connection running.

This PR also removes PhoenixRawSocket, which is unused, and incompatible with the changes.

Tests are pending, but I've been running it successfully as Mac and iOS app in the past day.

roughike · 2024-08-09T07:56:31Z

I read through the changes. Before addressing the tiny details, I'd like to talk about the big picture.

I agree with you that the WTFs/minute when reading the library's source code is a bigger number that it should be. It's supposed to closely follow the reference implementation, which at times, it does not do a very good job at. However, many of us are familiar with the twisted way the library works, as we've debugged and tried to understand it for the past few years as we've fixed issues.

If we merge your PR, and replace a lot of old code with your new code, we'd be throwing a lot of that knowledge away. For the near future, this PR might simplify things on the surface level by splitting things up, but at a cost of replacing "the devil we know" with a new saint that we don't know that well yet.

@kpsroka: While this PR improves socket handling in the Phoenix library, it is not guaranteed to resolve any particular problem. Its intention is to primarily make the socket flow easier to understand and debug.

This being the case, can we make very minimal (or zero?) changes to the existing code while achieving the same goal?

For example:

Increase logging without changing existing code.
Add more documentation comments about certain flows. For example, about the heartbeats (they're sent every 30s, if we receive server response, we effectively reset the heartbeat timeout), connection state changes, etc. Might be a double-edged sword if the documentation is not kept up to date.
Add inline code comments at places where code is hard to read. I was going to say the other option is to make minimal changes to make the code more readable in separate PRs, but maybe it's better to stay as close to the reference implementation as possible so that it's easy to compare the two.
If there are clear problems we need to improve, make purpose-built PRs for those.
Something else?

As this PR currently stands, it feels a bit risky to merge it, especially when it does not fix anything concrete. We also don't have that many tests to begin with, so more minimal changes with a clear goal would be easier to manage.

kpsroka · 2024-08-12T11:33:27Z

I read through the changes. Before addressing the tiny details, I'd like to talk about the big picture.

I agree with you that the WTFs/minute when reading the library's source code is a bigger number that it should be. It's supposed to closely follow the reference implementation, which at times, it does not do a very good job at. However, many of us are familiar with the twisted way the library works, as we've debugged and tried to understand it for the past few years as we've fixed issues.

If we merge your PR, and replace a lot of old code with your new code, we'd be throwing a lot of that knowledge away. For the near future, this PR might simplify things on the surface level by splitting things up, but at a cost of replacing "the devil we know" with a new saint that we don't know that well yet.

I think that while we might be familiar with the current code, it being hard to understand also makes us avoid changing it even if we suspect something is wrong with it - it is more pleasant to fix our own code instead. The other benefit of having it easier is that more people will be able to learn it, and detect errors. I understand that this relies on relative simplicity of the new solution, so I'm open to updating this PR to simplify it even more, but I disagree with the general sentiment of valuing the current knowledge over future use.

@kpsroka: While this PR improves socket handling in the Phoenix library, it is not guaranteed to resolve any particular problem. Its intention is to primarily make the socket flow easier to understand and debug.

This being the case, can we make very minimal (or zero?) changes to the existing code while achieving the same goal?

For example:
* Increase logging without changing existing code.

* Add more documentation comments about certain flows. For example, about the heartbeats (they're sent every 30s, if we receive server response, we effectively reset the heartbeat timeout), connection state changes, etc. Might be a double-edged sword if the documentation is not kept up to date.

* Add inline code comments at places where code is hard to read. I was going to say the other option is to make minimal changes to make the code more readable in separate PRs, but maybe it's better to stay as close to the reference implementation as possible so that it's easy to compare the two.

* If there are clear problems we need to improve, make purpose-built PRs for those.

* Something else?

I don't think these can fully replace the code change themselves, and the result with applying these changes to existing code without any structural change can make the end result less readable.

As this PR currently stands, it feels a bit risky to merge it, especially when it does not fix anything concrete. We also don't have that many tests to begin with, so more minimal changes with a clear goal would be easier to manage.

I probably should add more context: the intention for this PR was to resolve the duplicate connection problem by changes to the socket layer. But as with other PRs that we merged to resolve this problem, I cannot guarantee that it's going to fix it. From tests on my devices I didn't notice this problem surfacing*, but I'll introduce more tests for this, including probable causes.

I'm putting this PR into draft mode for the time being, I'll focus on some tests and additional changes to simplify this even further.

* I saw an instance of multiple rapid connection/disconnection though. I'll investigate what's causing this [probably some calls to connect get queued too many times, when the only last one prevails]).

brian-superlist · 2024-08-13T10:48:13Z

Summary of chat:

Naming and semantic meaning of ready and connected. Important for all of us to be on same page, would be good to hear from Iiro and Miguel about their opinion.
Want to ensure this version does not preclude us from using binary protobufs in the future. Didn't sound like this would create any blockers for that.

…h#86) * add timeout to websocket connection and socket heartbeat * added timeout to WebSocket ready future * closing sink before assigning null to ws * cancel heartbeat when socket is disposed

kpsroka · 2024-08-13T17:04:04Z

Summary of chat:

Naming and semantic meaning of ready and connected. Important for all of us to be on same page, would be good to hear from Iiro and Miguel about their opinion.

Want to ensure this version does not preclude us from using binary protobufs in the future. Didn't sound like this would create any blockers for that.

Replaced the PhoenixSocket._isConnected field with "_isOpen", which also is in line with the Phoenix state event naming.

Added some tests, and more comments. Will work on more tests tomorrow, but I think it's ready to review in detail now @roughike @miguelcmedeiros

brian-superlist

Some inline comments around naming and also implementation.

lib/src/socket_connection_attempt.dart

lib/src/socket_connection.dart

lib/src/socket.dart

lib/src/socket_state.dart

lib/src/socket.dart

kpsroka · 2024-08-20T09:36:50Z

@brian-superlist @miguelcmedeiros @roughike

Updated structurally how the code is split in various classes, with main intention being simplicity, and preservation of current behavior of PhoenixScoket.connect().

What changed:

SocketConnectionAttempt got renamed to DelayedCallback, and added a callback to be executed when the delay completes. This is now a more generic class, effectively serving as Future.delayed with control over delay. I wanted to preserve it as a dependency-less utility, so it has no connection (nominal or logical) to "sockets".
SocketConnectionManager is now unstoppable - I removed the "stop" facility, so effectively once started it can only be disposed. This allowed for simpler life-cycle of it. However, now it needs to be reinstantiated whenever PhoenixSocket closes (disposing of the manager) and reopens.

lib/src/socket.dart

lib/src/socket_connection.dart

brian-superlist · 2024-08-22T15:31:42Z

Hey all, I'm sorry to leave this comment right before I leave, but I'm not sure I fully support merging this PR. Right now, the network connection is pretty stable and recovers well. We have not had users writing in reports about seeing the offline indicator, nor are we sure we are still running into double connection issues. Finally, it looks like we are failing on a few different integration tests now.

Therefore, if this PR does not solve a specific issue, while possibly causing some regressions, I'm not sure if it's worth it. Readability is in the eye of the beholder, and I personally think this is harder to reason about than the previous code. Miguel mentioned that I might have Stockholm syndrome 😅, and I actually sympathize with that description. However, to make such a large change to a library that is now quite stable, we should have a strong case for it: What specifically does it improve?

I don't want to block the merging of this PR if the team feels it's the right direction, but I wanted to note that even after these latest changes I'm also hesitant about this changeset and the possibility that it might cause us new headaches with connections right after we solved so many of them.

If we want to continue with the plan we have right now: Fix up the integration tests, merge early next week, and test, I can disagree and commit. However, I wanted to try to clarify that I'm still not fully sold on this set of changes.

roughike · 2024-08-23T07:54:50Z

Just for the record - I echo Brian's comment and what I said 2 weeks ago.

Feels like the risk-to-reward ratio is off with this one. I don't feel like it's the right direction, I don't think the changes simplify anything, and I think actually makes things more hard to follow for me.

I don't want to merge this one. If you and @miguelcmedeiros feel strongly about merging this one, I will not block you, but I'll assume that you know what you're doing and know the risks.

If you do decide to merge, at least the failing integration tests must be fixed or there has to be a very good reason why not to fix them.

MarcelKaeding · 2024-08-23T13:24:59Z

Let's merge not before the 1.14.0 build was sent to Testlio. We can then leave it in internal testing for 1 cycle before we release it to users.

kpsroka · 2024-08-27T16:08:09Z

Just for the record - I echo Brian's comment and what I said 2 weeks ago.

Feels like the risk-to-reward ratio is off with this one. I don't feel like it's the right direction, I don't think the changes simplify anything, and I think actually makes things more hard to follow for me.

I can add additional tests over the ones that are here, and that I've added myself, to even further minimize the risk.

I don't want to merge this one. If you and @miguelcmedeiros feel strongly about merging this one, I will not block you, but I'll assume that you know what you're doing and know the risks.

That's why I'd like to have it tested internally on daily before pushing it to production.

If you do decide to merge, at least the failing integration tests must be fixed or there has to be a very good reason why not to fix them.

After a bit of digging I realized that the backend at example/backend is a backend for integration testing (maybe the backend). I removed two tests which were not passing since (at least) 658661b, and adjusted some - mostly to call socket.dispose() either instead of socket.close() or actually running socket.dispose() at all.

…er is closed or disposed

…-state-cannot-reconnect-a-disposed-socket [part of DEV-11757] Makes failed heartbeat not attempt to reconnect after ConnectionManager is closed or disposed

kpsroka added 11 commits August 6, 2024 19:51

Removes unused PhoenixRawSocket

46ed4e3

Revamp of the PhoenixSocket library.

a394dfb

Simplify and correct

3722cec

Improves heartbeat handling

3788e7c

Final touches (lots of them)

26b0ba7

Even more improvements!

dd00743

Simplify, document, and improve

2824513

Merge branch 'master' into nextgen

f639b9b

More simplification

335b233

Reverting unncessary pubspec change

d4bf5df

Fixing lastState bug.

f9def19

kpsroka requested review from roughike, miguelcmedeiros and brian-superlist August 8, 2024 13:25

kpsroka marked this pull request as draft August 12, 2024 11:33

Simplification+name adjustments

451a6a7

Neelansh-ns and others added 5 commits August 13, 2024 11:40

add timeout to websocket connection and socket heartbeat (braverhealt…

25fa1da

…h#86) * add timeout to websocket connection and socket heartbeat * added timeout to WebSocket ready future * closing sink before assigning null to ws * cancel heartbeat when socket is disposed

Some initial tests for SocketConnectionManager

2562f10

Renames _isConnected to _isOpen

21b603f

Adds more comments

758bac5

Fixing some tests

be87621

kpsroka marked this pull request as ready for review August 13, 2024 17:04

brian-superlist reviewed Aug 14, 2024

View reviewed changes

lib/src/socket.dart Show resolved Hide resolved

brian-superlist reviewed Aug 14, 2024

View reviewed changes

lib/src/socket.dart Outdated Show resolved Hide resolved

brian-superlist reviewed Aug 14, 2024

View reviewed changes

lib/src/socket.dart Outdated Show resolved Hide resolved

Updates comments

30a397e

kpsroka requested review from miguelcmedeiros and brian-superlist August 20, 2024 09:37

miguelcmedeiros reviewed Aug 20, 2024

View reviewed changes

lib/src/socket.dart Outdated Show resolved Hide resolved

Refactors PhoenixSocket.sendMessage to get better control over heartbeat

77a41bb

kpsroka requested a review from miguelcmedeiros August 21, 2024 08:32

miguelcmedeiros approved these changes Aug 21, 2024

View reviewed changes

lib/src/socket_connection.dart Show resolved Hide resolved

kpsroka added 3 commits August 21, 2024 13:29

Avoids unnecessary _maybeConnect() in SocketConnectionManager.start()

9ecf278

Updates mocks

bcafd10

Merge branch 'master' into nextgen

4d96d47

kpsroka added 7 commits August 27, 2024 11:53

Small improvements in PhoenixException

1e14a29

Fixes order of operation on socket close

b8e69b8

Adds timeout to ws.ready

dd7208e

Avoids duplicating close events

64bce8b

Fixing PhoenixSocket's close/dispose

5892858

Updates tearDown in channel_integration_tests.dart

79c0e99

Removes obsolete tests

aac862f

kpsroka force-pushed the nextgen branch from 73abfde to aac862f Compare August 27, 2024 11:26

kpsroka added 3 commits August 27, 2024 17:37

Updates mocks

f2aab67

Replaces MockPhoenixSocketOptions with actual object

bb46924

Adding socket.dispose to integration tests, fixing one test

b96c2d5

kpsroka added 4 commits August 30, 2024 15:36

Updates range of DelayedCallback._id to be JS-compatible

97cfb6f

Makes failed heartbeat not attempt to reconnect after ConnectionManag…

f633357

…er is closed or disposed

Makes tests run non-concurrently

d87e620

Merge pull request #10 from superlistapp/fix/dev-11757-stateerror-bad…

28b0006

…-state-cannot-reconnect-a-disposed-socket [part of DEV-11757] Makes failed heartbeat not attempt to reconnect after ConnectionManager is closed or disposed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DEV-11497] Socket connection management #8

[DEV-11497] Socket connection management #8

kpsroka commented Aug 7, 2024 •

edited

Loading

roughike commented Aug 9, 2024 •

edited

Loading

kpsroka commented Aug 12, 2024

brian-superlist commented Aug 13, 2024 •

edited

Loading

kpsroka commented Aug 13, 2024

brian-superlist left a comment

kpsroka commented Aug 20, 2024

brian-superlist commented Aug 22, 2024

roughike commented Aug 23, 2024 •

edited

Loading

MarcelKaeding commented Aug 23, 2024

kpsroka commented Aug 27, 2024 •

edited

Loading

[DEV-11497] Socket connection management #8

Are you sure you want to change the base?

[DEV-11497] Socket connection management #8

Conversation

kpsroka commented Aug 7, 2024 • edited Loading

roughike commented Aug 9, 2024 • edited Loading

kpsroka commented Aug 12, 2024

brian-superlist commented Aug 13, 2024 • edited Loading

kpsroka commented Aug 13, 2024

brian-superlist left a comment

Choose a reason for hiding this comment

kpsroka commented Aug 20, 2024

brian-superlist commented Aug 22, 2024

roughike commented Aug 23, 2024 • edited Loading

MarcelKaeding commented Aug 23, 2024

kpsroka commented Aug 27, 2024 • edited Loading

kpsroka commented Aug 7, 2024 •

edited

Loading

roughike commented Aug 9, 2024 •

edited

Loading

brian-superlist commented Aug 13, 2024 •

edited

Loading

roughike commented Aug 23, 2024 •

edited

Loading

kpsroka commented Aug 27, 2024 •

edited

Loading