-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hashset feature] Convert SET datatype to use hashset instead of dict #1176
base: hashset
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## hashset #1176 +/- ##
===========================================
+ Coverage 70.40% 70.58% +0.17%
===========================================
Files 115 115
Lines 62480 63812 +1332
===========================================
+ Hits 43989 45039 +1050
- Misses 18491 18773 +282
|
return {[string match {*table size: $table_size*number of elements: $keys*} $htstats]} | ||
} | ||
|
||
test "SRANDMEMBER with a dict containing long chain" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted this test because hashset does not have linked list chains the way that dict does, so the aspect this is attempting to test no longer exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the keys and expires using hashet, I updated DEBUG HTSTATS
to count probing chain lengths instead of linked list lengths. Maybe it makes sense here too...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think this should be a hashset UT instead. Assuming that our random sampling doesn't follow probe chains we should be unaffected, but we want to guard against regressions in the future. My UT would make a hashset with one long chain of similar elements, then ensure those elements aren't under or over represented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Can you write that UT in another PR towards the hashset branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review. I'll look more later.
return {[string match {*table size: $table_size*number of elements: $keys*} $htstats]} | ||
} | ||
|
||
test "SRANDMEMBER with a dict containing long chain" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the keys and expires using hashet, I updated DEBUG HTSTATS
to count probing chain lengths instead of linked list lengths. Maybe it makes sense here too...?
@@ -71,6 +71,7 @@ | |||
* addressing scheme, including the use of linear probing by scan cursor | |||
* increment, by Viktor Söderqvist. */ | |||
#include "hashset.h" | |||
#include "server.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried hard to avoid having hashset depend on the whole valkey server.h. It's better to have dependencies in only one direction and allow hashset to be a relatively independent component only depending on low level stuff like zmalloc.h.
You added this for dismissMemory, right?
I think we can move the logic of dismissMemory from object.c
to zmalloc.c
. dismissMemory
basically just calls zmadvise_dontneed
which already knows the page size without using server.page_size
. It doesn't accept a size parameter though, but we can change zmadvise_dontneed
since it's actually only called from dismissMemory. We can add a size parameter and make it do all what dismissMemory does. In server.h we can add a dismissMemory as an alias (define) of zmadvise_dontneed
.
/* server.h */
#define dismissMemory zmadvise_dontneed
/* zmalloc.c */
void zmadvise_dontneed(void *ptr, size_t size_hint) {
/* Code moved from dismissMemory */
...
/* Code that was already in zmadvise_dontneed since before */
...
}
This changes the type of command tables from dict to hashset. Command table lookup takes ~3% of overall CPU time in benchmarks, so it is a good candidate for optimization. My initial SET benchmark comparison suggests that hashset is about 4.5 times faster than dict and this replacement reduced overall CPU time by 2.79% 🥳 --------- Signed-off-by: Rain Valentine <[email protected]> Co-authored-by: Rain Valentine <[email protected]>
Signed-off-by: Rain Valentine <[email protected]>
60f3f70
to
add04d0
Compare
Sorry for force-pushing the hashset branch again, to fix a DCO issue. Can you rebase and force-push again? (I guess it's better than merge when we have a DCO issue following us.) |
A fairly straightforward conversion, though I had to do a lot of debugging along the way. This requires a few fixes in hashset.c to pass all tests - this PR contains minimal versions of those fixes but my earlier PR (#1147) has better fixes for those issues.