- DAU: 100M
- QPS: Suppose a user posts 20 messages / day
- Average QPS = 100M * 20 / 86400 ~ 20k
- Peak QPS = 20k * 5 = 100k
- Storage: Suppose A user sends 20 messages / day
- 100M * 20 * 30 Bytes = 30G
- NoSQL database
- Large amounts of data
- No modification required
- Schema
Columns | Type | |
---|---|---|
messageId | integer | userID+Timestamp |
threadId | integer | the thread it belongs. Foreign key |
userId | integer | |
content | text | |
createdAt | timestamp |
- SQL database
- Need to support multiple index
- Index by
- Owner user Id: Search all of chat participated by me
- Thread id: Get all detailed info about a thread (e.g. label)
- Participants hash: Find whether a certain group of persons already has a chat group
- Updated time: Order chats by update time
- Schema
- Primary key is userId + threadId
- Why not use UUID as primary key? Need sharding. Not possible to maintain a global ID across different machines. Use UUID, really low efficiency.
- Primary key is userId + threadId
Columns | Type | |
---|---|---|
userId | integer | |
threadId | integer | createUserId + timestamp |
participantsId | text | json |
participantsHash | string | avoid duplicates threads |
createdAt | timestamp | |
updatedAt | timestamp | index=true |
label | string | |
mute | boolean |
- Sender sends message and message receiverId to server
- Server creates a thread for each receiver and message sender
- Server creates a new message (with thread_id)
- How does user receives information
- Pull server every 10 second
- Message table
- NoSQL. Do not need to take care of sharding/replica. Just need to do some configuration.
- Thread table
- According to userId.
- Why not according to threadId?
- To make the most frequent queries more efficient: Select * from thread table where user_id = XX order by updatedAt
- HTTP vs Socket
- HTTP: Only client can ask server for data
- Socket: Server could push data to client
- What if user A does not connect to server
- Relies on Android GCM/IOS APNS
- When a user opens the app, the user connects to one of the socket in Push Service
- If a user is inactive for a long period, drops the connection.
- Each socket connection needs a specific port. So needs a lot of push servers.
- The traffic could be sharded according to user_id
- User A opens the App, it asks the address of push server.
- User A stays in touch with push server by socket.
- User B sends msg to User A. msg is first sent to the message server.
- Message service finds the responsible push service according to the user id.
- Push service sends the msg to A.
- In case of large group chatting
- If there are 500 people in a group, message service needs to send the message to 500 push service. If most of receivers within push service are not connected, it means huge waste of resources.
- Add a channel service
- Each thread should have an additional field called channel. Channel service knows which users are online/offline. For big groups, online users need to first subscribe to corresponding channels.
- When users become online, message service finds the users' associated channel and notify channel service to subscribe. When users become offline, push service knows that the user is offline and will notify channel service to subscribe.
- Channel service stores all info inside memory.
- Each thread should have an additional field called channel. Channel service knows which users are online/offline. For big groups, online users need to first subscribe to corresponding channels.
- When users become online, send a heartbeat msg to the server every 3-5 seconds.
- The server sends its online status to friends every 3-5 seconds.
- If after 1 min, the server does not receive the heartbeat msg, considers the user is already offline.
- Wasteful because most users are not online