You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the connection to etcd is broken and etcd is replaced by a different instance, dfuse search fails to update its connection and stays broken. Also the health reports as "healthy" so monitoring when this situation occurs is challenging.
One possible solution:
Add a mechanism that detects that the GRPC connection to etcd was broken and just exit and wait to get restarted by k8s or systemd or whatever.
Scenario is probably something like this:
archive A tells etcd that it serves blocks 1000->2000 (BUT THAT ETCD IS GONE, REPLACED BY NEW REBUILT CLUSTER !!!)
router checks etcd, reads this and sends a query to archive A down to block 1000 (BUT THAT ETCD IS GONE, SO NO UPDATES !!!)
archive A says: hey I don't have block 1000, my lowest block is 1100 ("I TRIED TO TELL YOU VIA ETCD BUT MY UPDATE IS STALLED")
Manually restart the router and archives
they connect to the new etcd and that's all good
The text was updated successfully, but these errors were encountered:
When the connection to etcd is broken and etcd is replaced by a different instance, dfuse search fails to update its connection and stays broken. Also the health reports as "healthy" so monitoring when this situation occurs is challenging.
One possible solution:
Add a mechanism that detects that the GRPC connection to etcd was broken and just exit and wait to get restarted by k8s or systemd or whatever.
Scenario is probably something like this:
The text was updated successfully, but these errors were encountered: