-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client watch doesn't work after restarting etcd server #1209
Comments
We have a test that does something similar: can you check it out and provide a pr with a failing test that reproduces your use case ? |
This test is passing. |
I don't have much time to do manual test, I'm very sorry so to investigate more, I need a reproducer in a form of an integration test. |
Try with sleep of 1min between stop and start in EtcdCluster::restart, test starts to fail with timeout (WatchResumeTest timeout is changed from 30 to 100sec) |
this seems to be that the system reaches the max retry attempts/timeout, have you set up an error handler on the watch listener ? |
Yes, just printing stack trace. I get below error irrespective of whether test passes or fails Test fails on this line when sleep between restart is 1min I tried creating another client(before restart) and used it to put key with new value. With this, test succeeds putting the value. But fails in assertion. |
can you please at least share your code ? |
Sure, I'm re-using WatchResumeTest.java which you had shared with minor changes. Do you need logs as well? |
@bhagyalakshmi1218 added some tests here #1210 with increasing timeout and all the test are passing. So I guess we are not testing the same thing or there are some differences in the set-up. Can you please provide a reproducer - in the form of an integration test - I can run as part of the test suite ? |
@lburgazzoli Did you try with 2 clients as well? |
@bhagyalakshmi1218 no but I as said, I'm really sorry but I'm the only maintainer left and I'm doing this in my spare time so I don't have the much time to do into a trial & error approach. I really need to have a test that fails so I can have a look at it |
I think I know what's happening but I don't have any clear solution at this stage so in fact the failure is because the reconnect backoff policy implemented in the grpc-java kicks in and at some point it becomes very log, so if you change the timeout of your test it should work. At least I tried by setting up jetcd as a watcher, then using etcdctl to put data on the cluster and yes it takes a while. Note that there are a number of issues related to reconnect behavior and
|
@bhagyalakshmi1218 can you please confirm the behavior ? unfortunately, there's not so much I can do on the jetcd side as the re-connection logic is something hard-coded in the grpc-java library so I think we should close this issue. |
Versions
Describe the bug
etcd watches not getting notified after etcd server is restarted
To Reproduce
Expected behavior
Changes is notified to client after etcd server is restarted
The text was updated successfully, but these errors were encountered: