-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements via TSO/GRO and UDP_SEGMENT #439
Comments
If you look at benchmarks of tinc you will quickly find that for many real world workloads the largest user of CPU time is TUN/TAP. I did some work on sendmmsg in the past but rand into architectural issues primarily. Tinc was never built to handle a queue of packets (but this can change!) If you really want performance for tinc, build ktincd (linux kernel tinc). I've debated it numerous times. It was originally going to be one of my next experiments after the AES protocol changes merged (but they never did) The networking side of it wouldnt be too hard. Tinc is structured well enough that adapting to a linux netdev would not be too difficult. Configuration though is potentially a real nightmare. |
Another option might be investigating if io_uring can be used, and what performance improvements that can give. |
I don't know if io_uring is really worth the effort tbh. I don't have strong data to back this up however. Packet mmap appears to be the fastest way to read / send from tap. See for example https://github.com/google/gvisor/blob/master/pkg/tcpip/link/fdbased/mmap_unsafe.go#L50 However tinc doesn't have the architecture in place for batching on the tap side. And that's what holds me back. I'm not certain I want to do that level of change without guidance. |
The advantage of io_uring is that you don't have to batch things at all in the application. You can still do single packet |
This blog post by tailscale sounds promising. It points out that the Linux Tun device supports TSO/GRO offloading.
Also, there is another post for using GSO (Generic Segmentation Offload) to send multiple UDP packets from a single large buffer.
Both techniques reduce network stack traversals. Unfortunatedly these features do not seem to be well documented.
The text was updated successfully, but these errors were encountered: