-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consumer: sendRDY error not propogating #199
Comments
For a little bit of extra context, this seems to require a pretty specific set of circumstances for us. When the tunnel drops, sometimes it's detected and a reconnect happens, other times we see this:
When we fall into this mode, we do not observe a reconnect (even though the tunnel would have eventually come back up on it's own, we'd need to reinitialize the connection to NSQ) |
@armcknight @bmhatfield thanks for the detailed info, I'll try to take a look at this! |
I just poked around at this. Despite not handling the returned errors in The only reason why it wouldn't is if messages are in flight and never "complete", meaning the rest of the cleanup logic doesn't execute. This is probably a poor assumption though, perhaps we should bound this with some timeout. Thoughts? |
We resolved this recently on our end, your explanation is pretty spot-on for what we were experiencing. We had messages still in flight when the RDY count was getting redistributed, which caused the connection with the in-flight messages to close prematurely. We fixed this by upping I don't think I quite understand all of the inner workings of this package to comment on whether or not there should be a timeout on this operation in the client, so I'll leave that up to you, but hopefully the way we resolved this internally may provide some assistance in making that decision. |
Hello,
first of all apologies if my terminology is a bit off, I'm not a regular Go programmer :)
We run a process reading from NSQ servers over an SSH tunnel. While debugging an issue when this connection breaks, we found a potential problem with how an error from
sendRDY
will not fully propagate.sendRDY
possibly emits an error (go-nsq/consumer.go
Lines 950 to 964 in d71fb89
updateRDY
, which callssendRDY
, also possibly emits an error (go-nsq/consumer.go
Line 907 in d71fb89
But that error isn't handled in it's own recursive call here (
go-nsq/consumer.go
Line 940 in d71fb89
We were thinking that the failure for the error to fully propagate means our process doesn't pick up the loss of connection and doesn't know to attempt a mitigation.
We also found a few other invocations of
updateRDY
that don't appear to handle errors, which both appear instartStopContinueBackoff
, which doesn't report that it can throw an error (go-nsq/consumer.go
Line 761 in d71fb89
go-nsq/consumer.go
Line 795 in d71fb89
go-nsq/consumer.go
Line 810 in d71fb89
The text was updated successfully, but these errors were encountered: