[hashset feature] Convert SET datatype to use hashset instead of dict #1176

SoftlyRaining · 2024-10-16T01:23:20Z

A fairly straightforward conversion, though I had to do a lot of debugging along the way. This requires a few fixes in hashset.c to pass all tests - this PR contains minimal versions of those fixes but my earlier PR (#1147) has better fixes for those issues.

codecov · 2024-10-16T01:38:06Z

Codecov Report

Attention: Patch coverage is 86.27451% with 28 lines in your changes missing coverage. Please review.

Project coverage is 70.58%. Comparing base (8fe59b3) to head (add04d0).

Files with missing lines	Patch %	Lines
src/debug.c	0.00%	14 Missing ⚠️
src/defrag.c	72.72%	6 Missing ⚠️
src/rdb.c	83.33%	3 Missing ⚠️
src/db.c	92.00%	2 Missing ⚠️
src/t_set.c	97.05%	2 Missing ⚠️
src/hashset.c	93.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           hashset    #1176      +/-   ##
===========================================
+ Coverage    70.40%   70.58%   +0.17%     
===========================================
  Files          115      115              
  Lines        62480    63812    +1332     
===========================================
+ Hits         43989    45039    +1050     
- Misses       18491    18773     +282

Files with missing lines	Coverage Δ
src/object.c	`80.75% <100.00%> (+1.56%)`	⬆️
src/server.c	`88.86% <100.00%> (+0.14%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/t_zset.c	`95.64% <100.00%> (+<0.01%)`	⬆️
src/hashset.c	`67.56% <93.33%> (+27.35%)`	⬆️
src/db.c	`88.79% <92.00%> (+0.28%)`	⬆️
src/t_set.c	`97.49% <97.05%> (-0.34%)`	⬇️
src/rdb.c	`75.95% <83.33%> (-0.41%)`	⬇️
src/defrag.c	`86.02% <72.72%> (-0.90%)`	⬇️
src/debug.c	`51.80% <0.00%> (-1.92%)`	⬇️

... and 84 files with indirect coverage changes

SoftlyRaining · 2024-10-16T01:44:05Z

tests/unit/type/set.tcl

-        return {[string match {*table size: $table_size*number of elements: $keys*} $htstats]}
-    }
-
-    test "SRANDMEMBER with a dict containing long chain" {


I deleted this test because hashset does not have linked list chains the way that dict does, so the aspect this is attempting to test no longer exists.

For the keys and expires using hashet, I updated DEBUG HTSTATS to count probing chain lengths instead of linked list lengths. Maybe it makes sense here too...?

Hmm, I think this should be a hashset UT instead. Assuming that our random sampling doesn't follow probe chains we should be unaffected, but we want to guard against regressions in the future. My UT would make a hashset with one long chain of similar elements, then ensure those elements aren't under or over represented.

Sounds good. Can you write that UT in another PR towards the hashset branch?

zuiderkwast

Partial review. I'll look more later.

zuiderkwast · 2024-10-16T17:11:32Z

tests/unit/type/set.tcl

-        return {[string match {*table size: $table_size*number of elements: $keys*} $htstats]}
-    }
-
-    test "SRANDMEMBER with a dict containing long chain" {


For the keys and expires using hashet, I updated DEBUG HTSTATS to count probing chain lengths instead of linked list lengths. Maybe it makes sense here too...?

zuiderkwast · 2024-10-16T17:14:26Z

src/hashset.c

@@ -71,6 +71,7 @@
 *   addressing scheme, including the use of linear probing by scan cursor
 *   increment, by Viktor Söderqvist. */
 #include "hashset.h"
+#include "server.h"


I tried hard to avoid having hashset depend on the whole valkey server.h. It's better to have dependencies in only one direction and allow hashset to be a relatively independent component only depending on low level stuff like zmalloc.h.

You added this for dismissMemory, right?

I think we can move the logic of dismissMemory from object.c to zmalloc.c. dismissMemory basically just calls zmadvise_dontneed which already knows the page size without using server.page_size. It doesn't accept a size parameter though, but we can change zmadvise_dontneed since it's actually only called from dismissMemory. We can add a size parameter and make it do all what dismissMemory does. In server.h we can add a dismissMemory as an alias (define) of zmadvise_dontneed.

/* server.h */ #define dismissMemory zmadvise_dontneed /* zmalloc.c */ void zmadvise_dontneed(void *ptr, size_t size_hint) { /* Code moved from dismissMemory */ ... /* Code that was already in zmadvise_dontneed since before */ ... }

src/hashset.c

This changes the type of command tables from dict to hashset. Command table lookup takes ~3% of overall CPU time in benchmarks, so it is a good candidate for optimization. My initial SET benchmark comparison suggests that hashset is about 4.5 times faster than dict and this replacement reduced overall CPU time by 2.79% 🥳 --------- Signed-off-by: Rain Valentine <[email protected]> Co-authored-by: Rain Valentine <[email protected]>

Signed-off-by: Rain Valentine <[email protected]>

zuiderkwast · 2024-10-18T09:45:53Z

Sorry for force-pushing the hashset branch again, to fix a DCO issue.

Can you rebase and force-push again? (I guess it's better than merge when we have a DCO issue following us.)

SoftlyRaining commented Oct 16, 2024

View reviewed changes

zuiderkwast reviewed Oct 16, 2024

View reviewed changes

zuiderkwast force-pushed the hashset branch from 19576b5 to 8fe59b3 Compare October 17, 2024 13:35

Convert SET from dict -> hashset (squashed)

add04d0

Signed-off-by: Rain Valentine <[email protected]>

SoftlyRaining force-pushed the set-datatype branch from 60f3f70 to add04d0 Compare October 17, 2024 22:00

zuiderkwast force-pushed the hashset branch from 8fe59b3 to 3038293 Compare October 18, 2024 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hashset feature] Convert SET datatype to use hashset instead of dict #1176

[hashset feature] Convert SET datatype to use hashset instead of dict #1176

SoftlyRaining commented Oct 16, 2024

codecov bot commented Oct 16, 2024 •

edited

Loading

SoftlyRaining Oct 16, 2024

zuiderkwast Oct 16, 2024

SoftlyRaining Oct 18, 2024

zuiderkwast Oct 19, 2024

zuiderkwast left a comment

zuiderkwast Oct 16, 2024

zuiderkwast Oct 16, 2024

zuiderkwast commented Oct 18, 2024

[hashset feature] Convert SET datatype to use hashset instead of dict #1176

Are you sure you want to change the base?

[hashset feature] Convert SET datatype to use hashset instead of dict #1176

Conversation

SoftlyRaining commented Oct 16, 2024

codecov bot commented Oct 16, 2024 • edited Loading

Codecov Report

SoftlyRaining Oct 16, 2024

Choose a reason for hiding this comment

zuiderkwast Oct 16, 2024

Choose a reason for hiding this comment

SoftlyRaining Oct 18, 2024

Choose a reason for hiding this comment

zuiderkwast Oct 19, 2024

Choose a reason for hiding this comment

zuiderkwast left a comment

Choose a reason for hiding this comment

zuiderkwast Oct 16, 2024

Choose a reason for hiding this comment

zuiderkwast Oct 16, 2024

Choose a reason for hiding this comment

zuiderkwast commented Oct 18, 2024

codecov bot commented Oct 16, 2024 •

edited

Loading