-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate Parallel Hashmap for potential performance benefits #258
Comments
Hi @donpellegrino ! Thanks for doing these cleanup PR's most of them seem straightforward changes to keep the CPP modern, so no problem merging those. Could you tell me a little about your interest in HDT? Back to this issue: I'm not that into CPP, so can't really assess, but that parallel hashmap looks like a nice addition. But would it make sense to have this in a testing branch first to see whether there are performance impacts? |
Yes, this change needs regression testing to ensure the operations give equivalent results when done in parallel and performance testing to ensure there is justification for switching from the standard library. I intend to baseline with Intel Advisor and then try out Parallel Hashmap in place of the standard library. One challenge is identifying a good dataset that is large enough to use for performance testing. Ideally, it would be standardized and applicable to other approaches for processing triples to support future comparative analyses. If there are any suggestions for datasets, please let me know. |
…nt_unordered_map
…_set not found to be in use. Replaced some std::map with phmap, but more remain. std:set still to be replaced with phmap.
The
libhdt/src/dictionary/PlainDictionary.hpp
implementation previously made reference to "GOOGLE_HASH" with some compiler directives to include<sparsehash/sparse_hash_map>
in place of<ext/hash_map>
. No corresponding configuration options for that library were found in configure.ac.It looks like greg7mdp/parallel-hashmap is the currently maintained evolution of that codebase. It may offer performance benefits over standard
<unordered_map>
.Enabling Parallel Hashmap can be done by pointing to a clone of the GitHub repository via the compiler's include path. The following changes would be made in
libhdt/src/dictionary/PlainDictionary.hpp
:#include <unordered_map>
with#include <parallel_hashmap/phmap.h>
typedef std::unordered_map<const char *, DictionaryEntry *, hash<std::string>, str_cmp> DictEntryHash;
withtypedef phmap::parallel_node_hash_map<const char *, DictionaryEntry *, hash<std::string>, str_cmp> DictEntryHash;
Corresponding changes would be needed to the GNU Build System (e.g., configure.ac) and installation instructions in README.md.
Testing would need to be done to determine if Parallel Hashmap could speed up HDT operations.
Related issues include #245 and #256. Getting more of the test suite operational may be a good prerequisite for a change such as this to ensure there are no breakages.
The text was updated successfully, but these errors were encountered: