-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Popcount function is under performing #59
Comments
So, it turns out that at least part of the disparity between the observed throughput of (about 12-14 GiB/s) and the actual (expected) bandwidth (about 20 GiB/s on a 2.4 GHz CPU) of popcounting is explained by the difference between doing a "lots of popcounts on small arrays" (the first case) and doing "fewer popcounts on larger arrays" (the second case). This is presumably because the "output pipeline" is stalling/stumbling at the point where the popcount is calculated and written back to memory, which happens much more often in the first case compared to the second. There is no obvious fix to this. Further investigation would be necessary to find a solution to this (which would almost certainly be a fiddly and low-level one). |
I'm having a look at switching to using https://github.com/kimwalisch/libpopcnt instead of hand rolling our own - this should also help us support other platforms and take advantage of newer/different instruction sets too. |
In branch hlaw-fix-issue56 the raw popcount function performance is reported to be about 12 GiB/s (on my machine), however (essentially) the same code performs at 18 GiB/s when called from a basic wrapper function thus:
Aha! Link: https://csiro.aha.io/features/ANONLINK-71
The text was updated successfully, but these errors were encountered: