Have data ready on app startup #5490

gnprice · 2022-09-15T13:12:29Z

The problem

We frequently discard and re-fetch from scratch almost all the data from the server: the list of streams and their names, colors, etc., the list of other users and their names, avatars, etc., how many unread messages the user has and where, etc., etc.

Specifically, we do this

(a) each time the app starts up when not already running;
(b) each time the user switches from one account to another as the one they're actively looking at;
(c) and each time the app becomes able to connect to the server again after 10 minutes of not being able to do so.
- In particular this applies to the device being in airplane mode, or otherwise without a connection, for 10 minutes.
- This might also apply to the app being put in the background, or to the device being put to sleep / locked, if those cause the operating system to stop letting the app run and make network requests. I'm not sure to what extent mobile OSes currently do that in either of those situations.

What this looks like for the user is:

In cases (a) and (c), we show the stale data we have, along with a "Connecting..." banner at the top of the screen.
- In addition to that behavior (which is intended, as long as we have the underlying issue that we don't have current data), there are some glitches if the user navigates to the message list while we're still connecting: the sequence of different things we show in the message list is glitchy, and if they start writing a draft message it can get lost. (I believe this is Just-opened narrow sometimes flickers "no messages" before fetch has started #5152.)
In case (b), once they switch accounts we just show the loading screen until we get the new server data. (We don't have any data, even stale, for the other account.)
- The lack of data also means we can't do features like show on the list-of-accounts screen the number of unreads for each account. (This is an old feature request: Add unread counts to account view #893.)

One of the key technical points going on in the background that drives this behavior is that in order to use a given set of server data, we need to have on the Zulip server an active event queue that tracks changes to that data. When we lack such a queue, the data unavoidably becomes out of date and it's impossible to reliably maintain a set of data that's internally consistent.

A related technical point: if we have a set of server data that's old, but also have an active event queue for it, then we can update the data by just fetching the events. In a large organization, this will be far less data than is needed when refetching from scratch, so may be much faster to acquire.

The goals

The main goal here is:

When the user opens the app, we should typically have fresh up-to-date server data to show them.

(For this issue, we'll stick to doing so for one account at a time. Doing so for multiple accounts is #5005.)

This means that the user will immediately see the Zulip messages they have, rather than sit there waiting for them to load.

Sub-goals in order to accomplish this include that we should:

Keep the event queue between runs of the app, so that if the old event queue is still active then we pick up where we left off.
Keep the user's server data up to date in the background, by polling the event queue in between times that the app is in the foreground. (For example, we could trigger this when we get a notification, and fetch the messages in that conversation. We could then go further by occasionally sending background notifications, and having the client update the set of unreads and perhaps fetch some messages.)

Thus far, we'd have great behavior but only if the user has used the app in the last 10 minutes -- longer than that, and the event queue on the server would have expired so we'd be forced to reload from scratch after all. To go further, we should:

Keep the event queue alive longer on the server.
- An hour or two would be enough to help significantly for someone using the app regularly during the day.
- A day would let the app remain fresh overnight. In combination with background updates, this could keep it fresh indefinitely for a user in a busy realm.
- To keep the app's data up to date for users in all kinds of realms, we could keep event queues live for a longer period like a week while sending "heartbeat" background notifications after a slightly shorter interval to cause the app to check in.

Two of these three changes -- keeping the event queue between runs, and keeping it alive longer on the server -- are very small code changes in themselves. The main work to be done for them is in dealing with certain risks and challenges:

The fact that we regularly re-load all server data from scratch helps us paper over many small gaps and bugs in how we handle Zulip events (and other fetches of data) to maintain that data over time. See in particular:
- Ensure all server events and their operations are handled #3408 (an umbrella issue with many small subtasks)
- Consistently clear old data on all ways of leaving an account #4446
- Handle event-vs-fetch races soundly #5434
- Use updated stream/sender names/avatars on messages #5208
- Discard fetches when the active, logged-in account has changed #5009
- and formerly Store persistent data in a sound way #4841 -- fixing that cleared one of the major obstacles to this project.
I think we can cheerfully start keeping data somewhat longer than we do today with the existing state of these bugs. But we should crank the duration up gradually, and we should spend some time resolving those issues one by one before we take things to a point where many users don't routinely get any from-scratch reloads at all.
Similarly, today if a user encounters buggy behavior in the Zulip app, they can force-quit and relaunch it, and if the bug had to do with how we maintain server data then that will clear the issue by causing a reload from scratch.

I think it's OK to just give that up -- I can't recall the last time I used this workaround myself, nor the last time we heard from a user doing so.
Event queues consume RAM on the server.

This consumption builds up as things happen in the user's realm, and gets cleared out when the client polls for events and acknowledges their receipt. If the queue sticks around longer between getting polled by the client, then it consumes more RAM -- plus, more queues may be alive at a time. So longer-lived queues will have a resource cost on the server.

I don't have a clear sense of how big that cost looks quantitatively. Some things we could do to mitigate it include:
- When an event queue gets long and/or old, we could downgrade it to a more compact form, one that's still enough to be able to reliably bring the state back up to date (but potentially with the client making additional requests when it returns, to fill in the details.) This idea is Support downgrading to a long-lived event queue #3916 / events: Add basic downgradeable event queue support. zulip#12926 .
- I believe the events fetched on a given round of polling don't get dropped from the queue then, but only on the next round -- because that's when the client tells the server it's indeed received those events. This is good for reliability, but makes the queues longer in steady state than they could be.
  
  When polling for events after a long period, we could make a second poll promptly afterward, to let the server drop those events from the queue. (Potentially it'd even be useful to add a feature to the API for the client to purely acknowledge events, without asking the server to send it any new ones.)
But before optimizing, we should measure. That is:
- We should get a sense of the existing cost of the event queues, and of how that cost varies when they last longer.

gnprice · 2022-09-15T13:14:46Z

That's a long issue description. But I believe the two immediately-available action items from it are:

Keep the event queue between runs of the app, so that if the old event queue is still active then we pick up where we left off.
We should get a sense of the existing cost [in RAM on the server] of the event queues, and of how that cost varies when they last longer.

gnprice added a-data-sync Zulip's event system, event queues, staleness/liveness P1 high-priority labels Sep 15, 2022

gnprice mentioned this issue Jan 5, 2024

Have data ready on app startup zulip/zulip-flutter#477

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have data ready on app startup #5490

Have data ready on app startup #5490

gnprice commented Sep 15, 2022 •

edited

Loading

gnprice commented Sep 15, 2022

Have data ready on app startup #5490

Have data ready on app startup #5490

Comments

gnprice commented Sep 15, 2022 • edited Loading

The problem

The goals

gnprice commented Sep 15, 2022

gnprice commented Sep 15, 2022 •

edited

Loading