You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We see that 866299 sockets are allocated, and 80,6 GiB worth of memory pages are reserved for their usage. (mem column is the amount of 4k pages, not bytes. the actual memory usage most probably will be less than this).
The "pidstat_-p_ALL_-rudvwsRU_--human_-h" output shows that the "vector" process has 859617 file descriptors open. Combining that finding with the active socket count of ~3000 866299 allocated sockets in sockstat, it points to the "vector" process not properly cleaning up the sockets and leaking them, resulting in a TCP memory exhaustion issue. This is consistent with everything being back to normal after restarting the "vector".
I think this issue could also be related to Loki and can be relevant to #130 although the large_dir already enabled for the device. It also did not help.
We also tried to tune some settings with Loki but they also did not help:
So for a short-term fix, you should be able to just restart the vector service. If it's still not shrinking, that is an indicator that there is queue build-up in vector.
If you could get us:
a juju debug-log --replay for cos-proxy, as well as for loki.
a dump of what vector top is showing if you leave it running for a couple of minutes.
a juju status --relations in the model cos-proxy resides in
Bug Description
The memory usage keeps increasing:
It seems we also have orphaned sockets which not closed by the vector service:
tcp CLOSE-WAIT 153530 0 10.252.20.82:5044 10.252.20.125:39332 users:(("vector",pid=1028,fd=16)) ino:1197351486 sk:6572e cgroup:/system.slice/vector.service -->
Looking at the "/proc/net/sockstat" output:
TCP: inuse 302 orphan 5 tw 4 alloc 866299 mem 20649344
We see that 866299 sockets are allocated, and 80,6 GiB worth of memory pages are reserved for their usage. (mem column is the amount of 4k pages, not bytes. the actual memory usage most probably will be less than this).
The "pidstat_-p_ALL_-rudvwsRU_--human_-h" output shows that the "vector" process has 859617 file descriptors open. Combining that finding with the active socket count of ~3000 866299 allocated sockets in sockstat, it points to the "vector" process not properly cleaning up the sockets and leaking them, resulting in a TCP memory exhaustion issue. This is consistent with everything being back to normal after restarting the "vector".
We also have the logs below in the unit:
We tried to add the "keepalive" option in the vector configuration but it did not help:
To Reproduce
Unfortunately, I have not the steps to reproduce.
I think this issue could also be related to Loki and can be relevant to #130 although the large_dir already enabled for the device. It also did not help.
We also tried to tune some settings with Loki but they also did not help:
Environment
cos-proxy charm revision is 117.
Relevant log output
Aug 19 08:53:47 oscomputend3 kernel: [397430.141313] TCP: out of memory -- consider tuning tcp_mem
Additional context
Please let me know if anything else is needed for the resolution.
The text was updated successfully, but these errors were encountered: