Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "zombie" http output websocket connections #125

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

buger
Copy link

@buger buger commented Oct 3, 2024

At the moment http_server websocket implementation just writes to webscocket, and do not have a way to detect client disconnect proactively. More over, if client disconnects, and it fails write to it, it does not stop the handler, and continues to be inside the endless for loop, untill stream is stopped for !h.shutSig.IsSoftStopSignalled() {.

It cause issues when you have multiple clients connected/disconnected, because essentially it fanout using round robin to them, but if you have closed "zombie" connections, they will also consume events.

It fixes the issue by using websocket native mechanisms like "ping/pong" methods. Also it adds ephemeral "Read", just to immidiately detect the client shutdown.

const (
writeWait = 10 * time.Second
pongWait = 60 * time.Second
pingPeriod = (pongWait * 9) / 10
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those who would read it may find it strange, so here is why:

By setting pingPeriod to 90% of pongWait (i.e., (pongWait * 9) / 10), we ensure that:

  1. The server sends a ping before the pongWait deadline:

    • If pongWait is 60 seconds, pingPeriod becomes 54 seconds.
    • This means the server sends a ping every 54 seconds.
  2. There's a buffer period for the client to respond:

    • The client has 6 seconds (the remaining 10% of pongWait) to respond with a pong before the read deadline (pongWait) expires.
  3. Avoiding premature timeouts:

    • If the pingPeriod were equal to pongWait, the read deadline might expire before the client has a chance to respond to the ping.
    • By setting pingPeriod slightly less than pongWait, we avoid this race condition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense! A small nit would be we should consider adding these as configuration variables with the advanced() flag. Think these are sensible defaults in any-case.

pingPeriod = (pongWait * 9) / 10
)

ws.SetReadLimit(512)
Copy link
Author

@buger buger Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the question, on why we need both ping/pong and Read goroutine:

  • Necessity of Ping/Pong with a Read Loop:
    • Idle Clients: The read loop alone cannot detect if an idle client has disconnected unexpectedly.
    • Timely Detection: Ping/pong can detect unresponsive clients within a predictable timeframe (pongWait), regardless of message traffic.
    • Protocol Compliance: Utilizing ping/pong aligns with WebSocket protocol standards and best practices.

@buger
Copy link
Author

buger commented Oct 3, 2024

^ fixed linter

Copy link
Collaborator

@gregfurman gregfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks so much for the contribution and fix 🍱😄 Just a couple of questions/suggestions.

Mostly concerned that the ReadMessage in that go-routine could cause a race-condition when sending a Ping to the same socket we write the message bytes to.

Otherwise, run make fmt && make lint on your local to sort out these linting issues 🙂

Comment on lines +478 to +481
if err := ws.SetWriteDeadline(time.Now().Add(writeWait)); err != nil {
h.log.Warn("Failed to set write deadline for ping: %v", err)
return
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setWriteDeadline method on the connection doesn't ever return an error (despite that confusing function signature)

Suggested change
if err := ws.SetWriteDeadline(time.Now().Add(writeWait)); err != nil {
h.log.Warn("Failed to set write deadline for ping: %v", err)
return
}
ws.SetWriteDeadline(time.Now().Add(writeWait))

Comment on lines +460 to +463
if err := ws.SetWriteDeadline(time.Now().Add(writeWait)); err != nil {
writeErr = err
break
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below comment:

Suggested change
if err := ws.SetWriteDeadline(time.Now().Add(writeWait)); err != nil {
writeErr = err
break
}
ws.SetWriteDeadline(time.Now().Add(writeWait));

Comment on lines +419 to +425
if err := ws.SetReadDeadline(time.Now().Add(pongWait)); err != nil {
h.log.Warn("Failed to set read deadline: %v", err)
return
}
ws.SetPongHandler(func(string) error {
return ws.SetReadDeadline(time.Now().Add(pongWait))
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to set the read deadline before creating and inside the PongHandler?

go func() {
defer close(done)
for {
_, _, err := ws.ReadMessage()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% clued up on websockets so excuse the ignorance but will doing a ws.ReadMessage() clear all bytes in the buffer after reading? Else, does a WriteMessage fan-out to all consumers on the connection?

I'm concerned about a race condition where the payload we send using the below WriteMessage ends up being consumed in this goroutine instead of to the client where intending to send this.

const (
writeWait = 10 * time.Second
pongWait = 60 * time.Second
pingPeriod = (pongWait * 9) / 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all makes sense! A small nit would be we should consider adding these as configuration variables with the advanced() flag. Think these are sensible defaults in any-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants