Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sending local state to other participants #4558

Draft
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

danxuliu
Copy link
Member

Fixes #3358

Requires #4536 (so set as draft until 4536 is merged and this one rebased)

⚠️ ⚠️ ⚠️ Work in progress, do NOT review ⚠️ ⚠️ ⚠️

First of all, sorry for the long, long delay and thanks a lot to everyone who provided information on the issue.

Note that the description below treats #4536 as part of this pull request (so the behaviours of the code are described in a pre-4536 state).

The problem started to happen in a0fa841 because, as described in the first part of #3358 (comment), the video is set as not available (or, rather, not known if available) when the connection state changes to NEW or CHECKING. This was done following the implementation of the WebUI; the idea is that until the connection is completly established it is not really known if the video is enabled or not, even if the stream has a video track (it could be a disabled video anyway, so the legacy behaviour to set it available if there is a track should be removed, but that is a different story), so the other client should explicitly provide that information once the connection is established (turns out that, although all that is correct with data channels, with signaling messages it would be possible to provide the state even before the peer connection is established, so this might need to be adjusted in the future). Although the change itself caused the video to no longer be shown (in some cases) the problem is in the sending side of the Android app.

As also described in the second part of #3358 (comment) sendInitialMediaStatus would not do anything when called on a subscriber peer connection, both because it does not have a local stream and because, even if it had, it is not the right connection to send data channel messages when using the HPB, as they must be sent in the publisher connection. But independently of that sendInitialMediaStatus will not be called anyway in most cases; it will not be called when the data channel is open, because for subscribers the observer is registered after the data channel was open already, and it will be randomly called when the connection state changes to CONNECTED, because it depends on hasInitiated, which is meant to be used to initiate the peer connection without HPB if the local participant session ID is "higher" than the remote participant session ID, but is not needed and, moreover, is wrong when using the external signaling server as in that case the comparison is made between the Nextcloud session ID and the signaling session ID.

In the case of publisher connections the data channel message will be also randomly sent when the connection state changes to CONNECTED due to the comparison between Nextcloud session ID and signaling session ID, although it will be always sent when the data channel is open. This means that if participant A is in a call and participant B joins then the state of participant A will not be sent to participant B (because it will be sent on the subscriber connection), but the state of participant B will be sent to participant A (because it will be sent on the publisher connection when the data channel is open). But then... why is the video of participant B visible for participant A only sometimes?

The reason is that when the HPB is used the connection is not established directly between both clients, but between the clients and the HPB. Therefore, when the data channel is open for participant B it is open between participant B and the HPB, but the data channel may not be open yet between participant A and the HPB. If the message is sent at this point it will reach the HPB, but it will not be relayed to participant A. Due to that reason the WebUI does not send just a single state message after the connection is established, but several ones with an exponential backoff to "ensure" that the state is received by the other participants even if their subscriber connection takes a while to be established.

But now the question is, why is video coming from the Android app always shown in the iOS app and the WebUI, despite all of the above?

The iOS app shows the video by default, so if a video track is sent by the Android app it is shown even if the Android app does not send a data channel message to enable it.

In the case of the WebUI received videos are disabled by default, but they are automatically enabled if it is detected that a video is being sent (which is a legacy behaviour and should not work like that, video should be shown only if explicitly enabled, but that is a different story). Therefore, again the video track sent by the Android app is shown even if the Android app does not send a data channel message to enable it.

In order to solve the issue, this pull request introduces the LocalStateBroadcasterXXX helper classes that take care of sending the local state to the other participants as needed. That is:

  • Send the current state to another participant in the call when that participant joins (if the local participant is the one that joined all the other participants joined from its point of view, so the state is sent to all of them); if the HPB is used the state is sent several times with an exponential backoff to solve the problem explained above about the connection not being established yet in the other end
  • Send the state changes to all the participants in the call; in this case a single message is sent
    • If the state is changed but a connection was not established yet the changed state should be eventually received through the initial exponential backoff (as it sends the current state, not the original state)

Before this pull request the state was sent only through data channels; now the state is sent through signaling messages too, as it is expected by other clients (in the past data channels were found to be problematic, but the signaling messages were more reliable and convenient, so that is why the state was moved to be sent also through signaling messages; the only exception is the speaking state, which, due to its potential frequency and being not so relevant, was left only as data channel messages to avoid hammering the database when the HPB is not used). Note that this should be backwards compatible with older Nextcloud versions, as if any signaling message was not handled it was just ignored, so there is no problem sending them.

The LocalStateBroadcaster (or, rather, its subclasses) also takes into account the differences between the internal or the external signaling server in order to send the messages using the appropriate connection.

All this is almost the same done in the WebUI, with some little improvements:

  • In the WebUI, when another participant joins the call, the message is sent to all participants rather than only to the participant that joined.
    • This is still the case in the Android app with Janus and data channels, but there is no way around that.
  • In the WebUI the state is sent again whenever the connection changes to the connected state. Therefore the state is sent again without need when a connection is temporarily interrupted (changes to disconnected and then back to connected). However, in the case of an ICE restart that would also send the state again, which would happen without HPB after the connection failed.
  • In the WebUI the state is not sent if the connection changes to the completed state; although typically the connection will go through the connected state before reaching the completed state it is possible to go directly from checking to completed, and in that case the state would not be sent.
  • In the WebUI, when the HPB is used, the state starts to be sent when a receiver connection is established with the other peer. Here the state is sent as soon as the remote participant is found, as the remote participant might already have a receiver connection to the local participant before the local participant has a receiver connection to the remote participant (although in most cases it should not make any difference, as establishing connections when the HPB is used is usually pretty fast).

Despite the improvements this approach is far from perfect and there is still an excessive amount of initial messages sent with the HPB due to the repeated sending. But this is tracked in nextcloud/spreed#8549 and it will need to be solved across all the clients at the same time.

Pending

  • Proper implementation of the exponential backoff
  • Send signaling messages also for the initial state with MCU

Follow ups

  • Send nick and raised hand state in LocalStateBroadcaster
  • Send status also when no peer connection will be established (right now it is not a problem because only media state is sent, but it will be with non-media state, like the nick and raised hand)

The PeerConnectionWrapper does not need to be injected in the
application, nor the Context needs to be injected in the
PeerConnectionWrapper. This all seems to be leftovers from the past, and
removing them would ease adding unit tests for the
PeerConnectionWrapper.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Log methods are static, so they can not be mocked using Mockito.
Although it might be possible to use PowerMockito a dummy implementation
was added instead, as Log uses are widespread and it is not something
worth mocking anyway.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This implicitly fixes trying to send the initial state on the latest
remote data channel found (which is the one stored in the "dataChannel"
attribute of the "PeerConnectionWrapper") when any other existing data
channel changes its status to open. Nevertheless, as all this will be
reworked, no unit test was added for it.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The legacy name was a bit strange, so now it is renamed to just "send"
as the parameter type ("DataChannelMessage") gives enough context.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Data channel messages are expected to be sent using the "status" data
channel that is locally created. However, if another data channel was
opened by the remote peer the reference to the "status" data channel was
overwritten with the new data channel, and messages were sent instead on
the remote data channel.

In current Talk versions that was not a problem, and the change makes no
difference either, because since the support for Janus 1.x was added
data channel messages are listened on all data channels, independently
of their label or whether they were created by the local or remote peer.

However, in older Talk versions this fixes a regression introduced with
the support for Janus 1.x. In those versions only messages coming from
the "status" or "JanusDataChannel" data channels were taken into
account. When Janus is not used the WebUI opens the legacy
"simplewebrtc" data channel, so that data channel may be the one used to
send data channel messages (if it is open after the "status" data
channel), but the messages received on that data channel were ignored by
the WebUI. Nevertheless, at this point this is more an academic problem
than a real world problem, as it is unlikely that there are many
Nextcloud servers with Talk < 16 and without HPB being used.

Independently of all that, when the peer connection is removed only the
"status" data channel is disposed, but none of the remote data channels
are. This is just a variation of an already existing bug (the last open
data channel was the one disposed due to being the last saved reference,
but the rest were not) and it will be fixed in another commit.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Data channel messages can be sent only when the data channel is open.
Otherwise the message is simply lost. Clients of the
PeerConnectionWrapper do not need to be aware of that detail or keep
track of whether the data channel was open already or not, so now data
channel messages sent before the data channel is open are queued and
sent once the data channel is opened.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Getting the label is no longer possible once the data channel has been
disposed. This will help to make the observer thread-safe.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Adding and disposing remote data channels is done from different
threads; they are added from the WebRTC signaling thread when
"onDataChannel" is called, while they can be disposed potentially from
any thread when "removePeerConnection" is called. To prevent race
conditions between them now both operations are synchronized.

However, as "onDataChannel" belongs to an inner class it needs to use a
synchronized statement with the outer class lock. This could still cause
a race condition if the same data channel was added again; this should
not happen, but it is handled just in case.

Moreover, once a data channel is disposed it can be no longer used, and
trying to call any of its methods throws an "IllegalStateException". Due
to this, as sending can be also done potentially from any thread, it
needs to be synchronized too with removing the peer connection.

State changes on data channels as well as receiving messages are also
done in the WebRTC signaling thread. State changes needs synchronization
as well, although receiving messages should not, as it does not directly
use the data channel (and it is assumed that using the buffers of a
disposed data channel is safe). Nevertheless a little check (which in
this case requires synchronization) was added to ignore the received
messages if the peer connection was removed already.

Finally, the synchronization added to "send" and "onStateChange" had the
nice side effect of making the pending data channel messages thread-safe
too, as before it could happen that a message was enqueued when the
pending messages were being sent, which caused a
"ConcurrentModificationException".

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
When the data channel is not open yet data channel messages are queued
and then sent once opened. "onStateChange" is called from the WebRTC
signaling thread, while "send" can be called potentially from any
thread, so to send the data channel messages in the same order that they
were added new messages need to be enqueued until all the pending
messages have been sent. Otherwise, even if there is synchronization
already, it could happen that "onStateChange" was called but, before
getting the lock, "send" gets it and sends the new message before the
pending messages were sent.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
"hasMCU" (which has always been the wrong name, because it is an SFU
rather than an MCU, but it is wrong even in the signaling server so for
now the legacy name is kept) was set again and again whenever the call
participant list changed. Now it is set instead once its value is known,
that is, when it is known that the internal signaling server is used (as
no "MCU" is used in that case), or when the connection with the external
signaling server is established, as its supported features are not known
until then.

This change should have no effect in the usages of "hasMCU", as it is
used when the call participant list change, which will happen only after
joining the call in the signaling server, or when sending "isSpeaking"
and toggling media, in both cases guarded by "isConnectionEstablished",
which will be true only once "performCall" was called or if the call is
active with other participants.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
@danxuliu danxuliu added bug Something isn't working 2. developing Work in progress feature: ☎️ call labels Dec 19, 2024
@danxuliu danxuliu force-pushed the fix-sending-local-state-to-other-participants branch from 6bfe01b to 87ff9e9 Compare December 20, 2024 15:09
For now it just provides support for sending a data channel message to
all participants, so notifying all participants when the media is
toggled or the speaking status change can be directly refactored to use
it.

While it would have been fine to use a single class for both MCU and no
MCU they were split for easier and cleaner unit testing in future
stages.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Data channel messages are expected to be sent only to peer connections
with "video" type, which provide the audio and video tracks of the
participant (and, in fact, peer connections for screen shares do not
even have data channels enabled in the WebUI).

Note that this could change if at some point several audio/video tracks
are sent in the same peer connection, or if "speaking" messages are
added to screen shares, but that will be addressed if/when that happens.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This is the counterpart of CallParticipantModel for the local
participant. For now it just stores whether audio and video are enabled
or not, and whether the local participant is speaking or not, but it
will be eventually extended with further properties.

It is also expected that the views, like the button with the microphone
state, will update themselves based on the model. Similarly the model
should be moved from the CallActivity to a class similar to
CallParticipant but for the local participant. In any case, all that is
something for the future; the immediate use of the model will be to know
when the local state changes to notify other participants.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
The LocalStateBroadcaster observes changes in the
LocalCallParticipantModel and notifies other participants in the call as
needed. Although it is created right before joining the call there is a
slim chance of the state changing before the local participant is
actually in the call, but even in that case other participants would not
be notified about the state due to the MessageSender depending on the
list of call participants / peer connections passed to it, which should
not be initialized before the local participant is actually in the call.

There is, however, a race condition that could cause participants to not
be added to the participant list if they join at the same time as the
local participant and a signaling message listing them but not the local
participant as in the call is received once the CallParticipantList was
created, but that is unrelated to the broadcaster and will be fixed
in another commit.

Currently only changes in the audio, speaking and video state are
notified, although in the future it should also notify about the nick,
the raised hand or any other state (but not one-time events, like
reactions). The notifications right now are sent only through data
channels, but at a later point they will be sent also through signaling
messages as needed.

Similarly, although right now it only notifies of changes in the state
it will also take care of notifying other participants about the current
state when they join the call (or the local participant joins).

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This is not possible when Janus is used, as Janus only allows
broadcasting data channel messages to all the subscribers of the
publisher connection.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Note that this implicitly send the current state to remote participants
when the local participant joins, as in that case all the remote
participants already in the call join from the point of view of the
local participant

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
Signed-off-by: Daniel Calviño Sánchez <[email protected]>
This will be used to have separate counts for data channel and signaling
messages.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
@danxuliu danxuliu force-pushed the fix-sending-local-state-to-other-participants branch from 87ff9e9 to 2fd628b Compare December 22, 2024 07:55
The speaking state is still sent only through data channels, as it is
not currently handled by other clients when sent through signaling
messages.

Signed-off-by: Daniel Calviño Sánchez <[email protected]>
@danxuliu danxuliu force-pushed the fix-sending-local-state-to-other-participants branch from 2fd628b to d094dc5 Compare December 23, 2024 12:35
Copy link
Contributor

APK file: https://www.kaminsky.me/nc-dev/android-artifacts/4558-talk.apk

qrcode

To test this change/fix you can simply download above APK file and install and test it in parallel to your existing Nextcloud Talk app.

Copy link
Contributor

Codacy

Lint

TypemasterPR
Warnings158158
Errors7172

SpotBugs

CategoryBaseNew
Bad practice66
Correctness22416
Dodgy code7171
Internationalization33
Malicious code vulnerability33
Performance44
Security11
Total312104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. developing Work in progress bug Something isn't working feature: ☎️ call
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Video calls using HPB shows no video from android app to android app
1 participant