Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get network statistics per container #35

Open
ashangit opened this issue Nov 20, 2018 · 8 comments
Open

Get network statistics per container #35

ashangit opened this issue Nov 20, 2018 · 8 comments
Labels
enhancement New feature or request

Comments

@ashangit
Copy link
Contributor

Some of our users would like to have some network statistics from containers, like:

  • nb bytes
  • nb packets
  • errors/drop...

To get those metrics we can rely on /prod//net/dev file and use same mechanism than the process tree one used on nodemanagers to get mmemory/vcore used by container.

@ashangit ashangit added the enhancement New feature or request label Nov 20, 2018
@jpbempel
Copy link
Contributor

jpbempel pushed a commit to jpbempel/garmadon that referenced this issue Nov 20, 2018
@ashangit
Copy link
Contributor Author

From my understanding this will provide metrics from the OS point of view not from containers, no?

@jpbempel
Copy link
Contributor

@ashangit /proc/<pid>/net/dev provides interface level stats, NOT per process!
So it doesn't change anything.

@jpbempel
Copy link
Contributor

AFAICT packets & errors are at interface levels so you cannot get per process.
you can get bytes recv/sent in different ways like for example at Hadoop Level but not packets or errors which don't have the information about which socket is associated with.

@ashangit
Copy link
Contributor Author

Ok my bad
So we need to find an other way. We can't also get it from hadoop as wehave more and more "non JVM" container (python, tensorflow...)

@jpbempel
Copy link
Contributor

@ashangit Don't seem to be obvious. nethogs (https://github.com/raboof/nethogs) uses libpcap to decode packet header and get length of the packet to know what is the real time bandwidth use by the process.
But cannot get history and then you need to run permanently to get total qty of bytes/packets recv/sent.
I don't think this is sustainable for our use case

Honestly I don't see solution to attribute network activity per process

@jpbempel
Copy link
Contributor

jpbempel commented Nov 22, 2018

Well after some research: I may have find a solution in ss -tinp

State       Recv-Q Send-Q                   Local Address:Port                                  Peer Address:Port
ESTAB       0      0                            10.0.2.15:22                                        10.0.2.2:57375               users:(("sshd",pid=9990,fd=3))
         cubic rto:201 rtt:0.241/0.054 ato:40 mss:1460 cwnd:10 bytes_acked:94885 bytes_received:43212 segs_out:303 segs_in:459 send 484.6Mbps lastsnd:757714 lastrcv:756996 lastack:756996 pacing_rate 965.8Mbps rcv_rtt:311220 rcv_space:29532
ESTAB       0      0                            10.0.2.15:22                                        10.0.2.2:56338               users:(("sshd",pid=7363,fd=3))
         cubic rto:201 rtt:0.449/0.182 ato:47 mss:1460 cwnd:10 ssthresh:16 bytes_acked:899733 bytes_received:255916 segs_out:4888 segs_in:8044 send 260.1Mbps lastsnd:15 lastrcv:16 lastack:15 pacing_rate 519.8Mbps rcv_rtt:451431 rcv_space:78608
ESTAB       0      0                            10.0.2.15:58400                                89.30.125.167:25                  users:(("telnet",pid=15543,fd=3))
         cubic rto:205 rtt:4.11/2.055 ato:40 mss:1460 cwnd:10 bytes_acked:1 bytes_received:24 segs_out:3 segs_in:2 send 28.4Mbps lastsnd:531253 lastrcv:531235 lastack:531235 pacing_rate 56.4Mbps rcv_space:29200
ESTAB       0      0                            10.0.2.15:22                                        10.0.2.2:50323               users:(("sshd",pid=2421,fd=3))
         cubic rto:201 rtt:0.654/0.244 ato:40 mss:1460 cwnd:8 ssthresh:7 bytes_acked:3058309 bytes_received:1102444 segs_out:19837 segs_in:35236 send 142.9Mbps lastsnd:531234 lastrcv:531432 lastack:531233 pacing_rate 285.6Mbps retrans:0/4 rcv_rtt:240486 rcv_space:54912

@ashangit
Copy link
Contributor Author

Looks to be a good startup, just have some concerns on the impact it could have on loaded servers
Lets discuss about it IRL next week

jpbempel pushed a commit to jpbempel/garmadon that referenced this issue Nov 26, 2018
jpbempel pushed a commit to jpbempel/garmadon that referenced this issue Nov 26, 2018
jpbempel pushed a commit to jpbempel/garmadon that referenced this issue Dec 6, 2018
jpbempel pushed a commit to jpbempel/garmadon that referenced this issue Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants