Using Keepalive for TCP Probe? #149
Replies: 5 comments 5 replies
-
Hi @JamesYYang 👋 , Some things that we may need to consider
@haoel I think we should have this as a configurable option to enable keepalive for a tcp probe. But i dont have the slightest idea about how complicated its implementation may be. |
Beta Was this translation helpful? Give feedback.
-
@JamesYYang thanks for this consideration. I have the following thoughts. EaseProbe's PurposeI think the keepalive is good for reusing the tcp connection, however, I don't think this is suitable for EaseProbe. EaseProbe not just only probes the service, but also probes the network, which means we need to know the latency of the DNS, TCP connection, TLS negotiation, and so on, we need to check whether we still can establish a new connection to the server, otherwise, the server should be marked to down. Besides, we have round trip time that needs to be calculated, which could be a factor for the healthy in future. So, every time EaseProbe must start over to do the fresh probe work, without this, the EaseProbe is functionally broken. is TIME_WAIT a problem?TIME_WAIT needs 2MSL, I think it should be fine. on Linux, it is 60s, on Mac, it is 15s. The default probe interval is 60s. I don't think TIME_WAIT could be a problem. What's the good design?Different people could have different answers. Different situations also could have different answers. This is a complicated topic, but I quite like two principles - 1) KISS 2) Benefit-Cost ratio TCP keepalive would introduce complexity and fewer benefits. So, the benefit-cost ratio is not good. Let's just keep it simple and stupid. |
Beta Was this translation helpful? Give feedback.
-
Actually, the DNS, TCP, TLS probe should NOT be ignored by the probe. I've seen many DNS, TCP, and TLS network issues in many companies.
As you probe your service so frequently, I believe you try to find all of the failures, so, you shouldn't ignore the DNS, TCP, or TLS failures. And using the keepalive to reuse the connection, not only just lose the integral network probe functionality, but also introduce the state for your service, and you have to manage many situations which make things much more complicated. if your X-Problem just tries to solve the issue of TIME_WAIT. I think you should go the better way instead of using keepalive. I have several solutions for you to solve the X-Problem.
|
Beta Was this translation helpful? Give feedback.
-
@JamesYYang PR #159 would address the TIME_WAIT issue completely. Please take a look. |
Beta Was this translation helpful? Give feedback.
-
再看时物是人非...😞 |
Beta Was this translation helpful? Give feedback.
-
I think close tcp connection immediately after connection Established is not good design. This will case a lot of socket in TIME_WAIT status.
So I think tcp probe can keep the connection open until server timeout or error occurs. And report the fail if re-connection failed.
What do you think?
Beta Was this translation helpful? Give feedback.
All reactions