-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k12: improve API and support multithreaded computation #443
base: main
Are you sure you want to change the base?
Conversation
1cec358
to
02c9cde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 32 GB,
Cores | Speedup |
---|---|
1 | 1.00x |
2 | 1.49x |
4 | 2.48x |
8 | 4.27x |
For 32MB
Cores | Speedup |
---|---|
1 | 1.00x |
2 | 1.28x |
4 | 1.81x |
8 | 1.92x |
the speedup increases with the length of the message (as expected) but the largest test case (where numbers to look ok) requires message of many GB in memory. Not sure if this is gonna be realistic since usually large reads are done in batches of smaller sizes.
For medium-sized messages (say 32MB), the max speedup is below 2x even with 8 threads.
*not in opposition to this effort, but timings make me think if something must be done differently (or I missed something)
Those tests don't use a buffer of GBs. Instead they use the size suggested by
There clearly is some concurrency bottleneck here which I haven't figured out yet. |
80cdfd3
to
7df5996
Compare
Use options style API, so that the common case is very simple: h := k12.NewDraft10() but we can provide options elegantly: h := k12.NewDraft10( WithContext([]byte("some context")), WithWorkers(runtime.NumCPU()), ) Allows multithreaded computation with the WithWorkers() option. On M2 Pro scales well with a few workers, but isn't able to utilize all cores effectively. BenchmarkK12_100B-12 5223334 228.8 ns/op 437.07 MB/s BenchmarkK12_10K-12 105744 11183 ns/op 894.21 MB/s BenchmarkK12_100K-12 27141 44364 ns/op 2254.06 MB/s BenchmarkK12_3M-12 1172 1010401 ns/op 3243.07 MB/s BenchmarkK12_32M-12 100 10033730 ns/op 3265.78 MB/s BenchmarkK12_327M-12 12 98888208 ns/op 3313.64 MB/s BenchmarkK12_3276M-12 2 991111938 ns/op 3306.19 MB/s BenchmarkK12x2_32M-12 183 6510423 ns/op 5033.16 MB/s BenchmarkK12x2_327M-12 18 63622058 ns/op 5150.41 MB/s BenchmarkK12x2_3276M-12 2 632823584 ns/op 5178.06 MB/s BenchmarkK12x4_32M-12 364 3300120 ns/op 9929.34 MB/s BenchmarkK12x4_327M-12 39 29477854 ns/op 11116.14 MB/s BenchmarkK12x4_3276M-12 2 581819167 ns/op 11263.98 MB/s BenchmarkK12x8_32M-12 520 2301923 ns/op 14235.05 MB/s BenchmarkK12x8_327M-12 76 15590312 ns/op 21018.18 MB/s BenchmarkK12x8_3276M-12 4 296590469 ns/op 22096.46 MB/s BenchmarkK12xCPUs_32M-12 472 2526827 ns/op 12968.04 MB/s BenchmarkK12xCPUs_327M-12 78 15139957 ns/op 21643.39 MB/s BenchmarkK12xCPUs_3276M-12 4 280958114 ns/op 23325.90 MB/s We only reach 23GB/s (at 12x) instead of the lower bound of 33GB/s expected with 10 performance cores. Adds {Max,Next}WriteSize to suggest the caller how big to choose their Write() calls. Adds a test to the CI, to check the k12 package with the data race detector on.
Use options style API, so that the common case is very simple:
but we can provide options elegantly:
Allows multithreaded computation with the WithWorkers() option. On M2 Pro scales well with a few workers, but isn't able to utilize all cores effectively:
We only reach 23GB/s (at 12x) instead of the lower bound of 33GB/s expected with 10 performance cores.
Adds {Max,Next}WriteSize to suggest the caller how big to choose their Write() calls.