-
-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sending local state to other participants #4558
Draft
danxuliu
wants to merge
27
commits into
master
Choose a base branch
from
fix-sending-local-state-to-other-participants
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The PeerConnectionWrapper does not need to be injected in the application, nor the Context needs to be injected in the PeerConnectionWrapper. This all seems to be leftovers from the past, and removing them would ease adding unit tests for the PeerConnectionWrapper. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Log methods are static, so they can not be mocked using Mockito. Although it might be possible to use PowerMockito a dummy implementation was added instead, as Log uses are widespread and it is not something worth mocking anyway. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This implicitly fixes trying to send the initial state on the latest remote data channel found (which is the one stored in the "dataChannel" attribute of the "PeerConnectionWrapper") when any other existing data channel changes its status to open. Nevertheless, as all this will be reworked, no unit test was added for it. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The legacy name was a bit strange, so now it is renamed to just "send" as the parameter type ("DataChannelMessage") gives enough context. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Data channel messages are expected to be sent using the "status" data channel that is locally created. However, if another data channel was opened by the remote peer the reference to the "status" data channel was overwritten with the new data channel, and messages were sent instead on the remote data channel. In current Talk versions that was not a problem, and the change makes no difference either, because since the support for Janus 1.x was added data channel messages are listened on all data channels, independently of their label or whether they were created by the local or remote peer. However, in older Talk versions this fixes a regression introduced with the support for Janus 1.x. In those versions only messages coming from the "status" or "JanusDataChannel" data channels were taken into account. When Janus is not used the WebUI opens the legacy "simplewebrtc" data channel, so that data channel may be the one used to send data channel messages (if it is open after the "status" data channel), but the messages received on that data channel were ignored by the WebUI. Nevertheless, at this point this is more an academic problem than a real world problem, as it is unlikely that there are many Nextcloud servers with Talk < 16 and without HPB being used. Independently of all that, when the peer connection is removed only the "status" data channel is disposed, but none of the remote data channels are. This is just a variation of an already existing bug (the last open data channel was the one disposed due to being the last saved reference, but the rest were not) and it will be fixed in another commit. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Data channel messages can be sent only when the data channel is open. Otherwise the message is simply lost. Clients of the PeerConnectionWrapper do not need to be aware of that detail or keep track of whether the data channel was open already or not, so now data channel messages sent before the data channel is open are queued and sent once the data channel is opened. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Getting the label is no longer possible once the data channel has been disposed. This will help to make the observer thread-safe. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Adding and disposing remote data channels is done from different threads; they are added from the WebRTC signaling thread when "onDataChannel" is called, while they can be disposed potentially from any thread when "removePeerConnection" is called. To prevent race conditions between them now both operations are synchronized. However, as "onDataChannel" belongs to an inner class it needs to use a synchronized statement with the outer class lock. This could still cause a race condition if the same data channel was added again; this should not happen, but it is handled just in case. Moreover, once a data channel is disposed it can be no longer used, and trying to call any of its methods throws an "IllegalStateException". Due to this, as sending can be also done potentially from any thread, it needs to be synchronized too with removing the peer connection. State changes on data channels as well as receiving messages are also done in the WebRTC signaling thread. State changes needs synchronization as well, although receiving messages should not, as it does not directly use the data channel (and it is assumed that using the buffers of a disposed data channel is safe). Nevertheless a little check (which in this case requires synchronization) was added to ignore the received messages if the peer connection was removed already. Finally, the synchronization added to "send" and "onStateChange" had the nice side effect of making the pending data channel messages thread-safe too, as before it could happen that a message was enqueued when the pending messages were being sent, which caused a "ConcurrentModificationException". Signed-off-by: Daniel Calviño Sánchez <[email protected]>
When the data channel is not open yet data channel messages are queued and then sent once opened. "onStateChange" is called from the WebRTC signaling thread, while "send" can be called potentially from any thread, so to send the data channel messages in the same order that they were added new messages need to be enqueued until all the pending messages have been sent. Otherwise, even if there is synchronization already, it could happen that "onStateChange" was called but, before getting the lock, "send" gets it and sends the new message before the pending messages were sent. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
"hasMCU" (which has always been the wrong name, because it is an SFU rather than an MCU, but it is wrong even in the signaling server so for now the legacy name is kept) was set again and again whenever the call participant list changed. Now it is set instead once its value is known, that is, when it is known that the internal signaling server is used (as no "MCU" is used in that case), or when the connection with the external signaling server is established, as its supported features are not known until then. This change should have no effect in the usages of "hasMCU", as it is used when the call participant list change, which will happen only after joining the call in the signaling server, or when sending "isSpeaking" and toggling media, in both cases guarded by "isConnectionEstablished", which will be true only once "performCall" was called or if the call is active with other participants. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
danxuliu
added
bug
Something isn't working
2. developing
Work in progress
feature: ☎️ call
labels
Dec 19, 2024
danxuliu
force-pushed
the
fix-sending-local-state-to-other-participants
branch
from
December 20, 2024 15:09
6bfe01b
to
87ff9e9
Compare
For now it just provides support for sending a data channel message to all participants, so notifying all participants when the media is toggled or the speaking status change can be directly refactored to use it. While it would have been fine to use a single class for both MCU and no MCU they were split for easier and cleaner unit testing in future stages. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Data channel messages are expected to be sent only to peer connections with "video" type, which provide the audio and video tracks of the participant (and, in fact, peer connections for screen shares do not even have data channels enabled in the WebUI). Note that this could change if at some point several audio/video tracks are sent in the same peer connection, or if "speaking" messages are added to screen shares, but that will be addressed if/when that happens. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This is the counterpart of CallParticipantModel for the local participant. For now it just stores whether audio and video are enabled or not, and whether the local participant is speaking or not, but it will be eventually extended with further properties. It is also expected that the views, like the button with the microphone state, will update themselves based on the model. Similarly the model should be moved from the CallActivity to a class similar to CallParticipant but for the local participant. In any case, all that is something for the future; the immediate use of the model will be to know when the local state changes to notify other participants. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The LocalStateBroadcaster observes changes in the LocalCallParticipantModel and notifies other participants in the call as needed. Although it is created right before joining the call there is a slim chance of the state changing before the local participant is actually in the call, but even in that case other participants would not be notified about the state due to the MessageSender depending on the list of call participants / peer connections passed to it, which should not be initialized before the local participant is actually in the call. There is, however, a race condition that could cause participants to not be added to the participant list if they join at the same time as the local participant and a signaling message listing them but not the local participant as in the call is received once the CallParticipantList was created, but that is unrelated to the broadcaster and will be fixed in another commit. Currently only changes in the audio, speaking and video state are notified, although in the future it should also notify about the nick, the raised hand or any other state (but not one-time events, like reactions). The notifications right now are sent only through data channels, but at a later point they will be sent also through signaling messages as needed. Similarly, although right now it only notifies of changes in the state it will also take care of notifying other participants about the current state when they join the call (or the local participant joins). Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This is not possible when Janus is used, as Janus only allows broadcasting data channel messages to all the subscribers of the publisher connection. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Note that this implicitly send the current state to remote participants when the local participant joins, as in that case all the remote participants already in the call join from the point of view of the local participant Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This will be used to have separate counts for data channel and signaling messages. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
danxuliu
force-pushed
the
fix-sending-local-state-to-other-participants
branch
from
December 22, 2024 07:55
87ff9e9
to
2fd628b
Compare
The speaking state is still sent only through data channels, as it is not currently handled by other clients when sent through signaling messages. Signed-off-by: Daniel Calviño Sánchez <[email protected]>
danxuliu
force-pushed
the
fix-sending-local-state-to-other-participants
branch
from
December 23, 2024 12:35
2fd628b
to
d094dc5
Compare
APK file: https://www.kaminsky.me/nc-dev/android-artifacts/4558-talk.apk |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #3358
Requires #4536 (so set as draft until 4536 is merged and this one rebased)
First of all, sorry for the long, long delay and thanks a lot to everyone who provided information on the issue.
Note that the description below treats #4536 as part of this pull request (so the behaviours of the code are described in a pre-4536 state).
The problem started to happen in a0fa841 because, as described in the first part of #3358 (comment), the video is set as not available (or, rather, not known if available) when the connection state changes to
NEW
orCHECKING
. This was done following the implementation of the WebUI; the idea is that until the connection is completly established it is not really known if the video is enabled or not, even if the stream has a video track (it could be a disabled video anyway, so the legacy behaviour to set it available if there is a track should be removed, but that is a different story), so the other client should explicitly provide that information once the connection is established (turns out that, although all that is correct with data channels, with signaling messages it would be possible to provide the state even before the peer connection is established, so this might need to be adjusted in the future). Although the change itself caused the video to no longer be shown (in some cases) the problem is in the sending side of the Android app.As also described in the second part of #3358 (comment)
sendInitialMediaStatus
would not do anything when called on a subscriber peer connection, both because it does not have a local stream and because, even if it had, it is not the right connection to send data channel messages when using the HPB, as they must be sent in the publisher connection. But independently of thatsendInitialMediaStatus
will not be called anyway in most cases; it will not be called when the data channel is open, because for subscribers the observer is registered after the data channel was open already, and it will be randomly called when the connection state changes toCONNECTED
, because it depends onhasInitiated
, which is meant to be used to initiate the peer connection without HPB if the local participant session ID is "higher" than the remote participant session ID, but is not needed and, moreover, is wrong when using the external signaling server as in that case the comparison is made between the Nextcloud session ID and the signaling session ID.In the case of publisher connections the data channel message will be also randomly sent when the connection state changes to
CONNECTED
due to the comparison between Nextcloud session ID and signaling session ID, although it will be always sent when the data channel is open. This means that if participant A is in a call and participant B joins then the state of participant A will not be sent to participant B (because it will be sent on the subscriber connection), but the state of participant B will be sent to participant A (because it will be sent on the publisher connection when the data channel is open). But then... why is the video of participant B visible for participant A only sometimes?The reason is that when the HPB is used the connection is not established directly between both clients, but between the clients and the HPB. Therefore, when the data channel is open for participant B it is open between participant B and the HPB, but the data channel may not be open yet between participant A and the HPB. If the message is sent at this point it will reach the HPB, but it will not be relayed to participant A. Due to that reason the WebUI does not send just a single state message after the connection is established, but several ones with an exponential backoff to "ensure" that the state is received by the other participants even if their subscriber connection takes a while to be established.
But now the question is, why is video coming from the Android app always shown in the iOS app and the WebUI, despite all of the above?
The iOS app shows the video by default, so if a video track is sent by the Android app it is shown even if the Android app does not send a data channel message to enable it.
In the case of the WebUI received videos are disabled by default, but they are automatically enabled if it is detected that a video is being sent (which is a legacy behaviour and should not work like that, video should be shown only if explicitly enabled, but that is a different story). Therefore, again the video track sent by the Android app is shown even if the Android app does not send a data channel message to enable it.
In order to solve the issue, this pull request introduces the
LocalStateBroadcasterXXX
helper classes that take care of sending the local state to the other participants as needed. That is:Before this pull request the state was sent only through data channels; now the state is sent through signaling messages too, as it is expected by other clients (in the past data channels were found to be problematic, but the signaling messages were more reliable and convenient, so that is why the state was moved to be sent also through signaling messages; the only exception is the speaking state, which, due to its potential frequency and being not so relevant, was left only as data channel messages to avoid hammering the database when the HPB is not used). Note that this should be backwards compatible with older Nextcloud versions, as if any signaling message was not handled it was just ignored, so there is no problem sending them.
The
LocalStateBroadcaster
(or, rather, its subclasses) also takes into account the differences between the internal or the external signaling server in order to send the messages using the appropriate connection.All this is almost the same done in the WebUI, with some little improvements:
Despite the improvements this approach is far from perfect and there is still an excessive amount of initial messages sent with the HPB due to the repeated sending. But this is tracked in nextcloud/spreed#8549 and it will need to be solved across all the clients at the same time.
Pending
Follow ups
LocalStateBroadcaster