-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identity hash function by default on GCC #73
Comments
I think a good solution would be to define custom GCC users will get proper hash function by default. No changes for non-GCC users. And hash function will remain fully tweakable. |
Yes, at the time of creating the map, I used the default Your solution would be a good compromise, I'll check to work on it when I have more time but don't hesitate to create a PR in the meantime if you have the time. |
I started working on PR. I decided to check GCC / libstd-c++ to see which hash finalizer is used there. For example, this code takes 8 seconds for 50K elements on my machine:
The same happens on Clang / libc++. Clearly, this issue is very unlikely to happen unintentionally if prime size is used. |
In the PR, I decided to use hash finalizers from MurmurHash2. The newer MurmurHash3 uses one more shift & multiply round in finalizer. |
I ran the following code built with GCC in Linux VM:
and I got:
So I guess inserting integers divisible by 2^20 takes quadratic time.
Moreover, trying to insert 16M values results in a crash.
Most likely because
std::hash<uint64_t>(x) = x
on GCC.Note that I used default settings and got no warnings!
Awful hash function by default is rather critical issue for people who don't know much about hashing (and would probably do worse trying to implement their own hash function or hash table). And given that TSL interface is very STL-like, I think that's the audience it is targeted at.
A proper hash function usually contains three parts:
uint64_t
to an index in hash table.As usual, C++ standard is not precise enough, and STL is not cross-platform.
On MSVC, std::hash performs steps 1 and 2, while std::unordered_set only does step 3.
On GCC, std::hash only performs step 1, while std::unordered_set does steps 2 and 3.
It means that if you use std::hash directly, then you should run your own hash finalizer. TSL hash table only does step 3, but uses std::hash, meaning that the crucial step 2 is missed.
The text was updated successfully, but these errors were encountered: