Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Performance analysis: Multi-threading in 1.14.1 doubles throughput, potential for further improvement #2386

Open
jklop123 opened this issue Oct 11, 2024 · 0 comments

Comments

@jklop123
Copy link

1.14.1 adds multi-threading support, which after adjustment, improves performance from 1.5 to 3 / 5 (unencrypted) Gbits/sec in the same environment 🎉

Test environment: PVE; Intel Xeon Gold 6330; Linux x64;

  • Client: 4c4t 8G; ZeroTier 1.14.1
  • Server: 16c16t 16G; ZeroTier 1.14.1
  • Client and server connected directly via PVE vmbr, not a bottleneck. iperf3 code test speed results are 30 Gbits/sec

iperf3 speed test, 8 or more TCP streams, lasting 30s.

With multi-threading disabled on both client and server, performance is around 1.5 Gbits/sec.

With multi-threading enabled and "trustedPathId" configured for no encryption (optional):

Client configuration:

{
    "physical":
    {
        "192.168.0.0/24":
        {
            "trustedPathId": 101010024
        }
    },
    "settings":
    {
        "multicoreEnabled": true,
        "concurrency": 4,
        "cpuPinningEnabled": true,
    }
}

Server configuration:

{
    "physical":
    {
        "192.168.0.0/24":
        {
            "trustedPathId": 101010024
        }
    },
    "settings":
    {
        "multicoreEnabled": true,
        "concurrency": 8,
        "cpuPinningEnabled": true,
    }
}

Bandwidth reaches 3 / 5 (unencrypted) Gbits/sec 🎉

  • When multiple clients test simultaneously, the result remains 3 / 5 (unencrypted) Gbits/sec

1.14.1 Performance Bottleneck

Performance bottleneck analysis may contain errors. Please point out any inaccuracies, thank you very much.

  • The multi-threading added in this commit
    does not include decryption; there's still only one thread for decryption.
  • int bucket = flowId % _concurrency; always results in only one thread executing.
  • In the test scenario, the decryption thread is fully loaded, so performance cannot stack. Below are flame graphs of this thread:

121494
138784

A straightforward idea is to add more decryption threads to resolve this bottleneck.

Conclusion

It can be verified that the addition of multi-threading in 1.14.1 has resulted in a 2x performance improvement for ZeroTier. Thanks to the developers for their work.

  • Requires Linux platform and multi-threading enabled on both sides.

Based on preliminary analysis, the bottleneck in 1.14.1's multi-threading appears to be the single decryption thread. Perhaps adding more decryption threads could be a direct improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant