-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using croaring library instead of MaskCollection #4356
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM.
This is a very cool optimization we should've done long time ago 😄
src/include/common/mask.h
Outdated
} | ||
maskData = std::make_unique<MaskData>(maxOffset + 1); | ||
} | ||
explicit RoaringBitMapSemiMask(common::table_id_t tableID, common::offset_t maxOffset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, bitmap
should be a single word. so can u rename this to RoaringBitmapSemiMask
?
src/include/common/mask.h
Outdated
struct RoaringBitMapSemiMaskUtil { | ||
static std::shared_ptr<RoaringBitMapSemiMask> createRoaringBitMapSemiMask( | ||
common::table_id_t tableID, common::offset_t maxOffset) { | ||
if (maxOffset > std::numeric_limits<u_int32_t>::max()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (maxOffset > std::numeric_limits<u_int32_t>::max()) { | |
if (maxOffset > std::numeric_limits<uint32_t>::max()) { |
@@ -36,7 +36,7 @@ class ScanNodeTableSharedState { | |||
common::node_group_idx_t currentUnCommittedGroupIdx; | |||
common::node_group_idx_t numCommittedNodeGroups; | |||
common::node_group_idx_t numUnCommittedNodeGroups; | |||
std::unique_ptr<common::NodeVectorLevelSemiMask> semiMask; | |||
std::shared_ptr<common::RoaringBitMapSemiMask> semiMask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should switch to shared_ptr
here. I don't see clear win or loss of using shared_ptr here except that it avoids passing raw pointers around while making the life cycle of semiMask less obvious. @andyfengHKU should also comment here.
} | ||
|
||
// include&exclude | ||
std::vector<common::offset_t> range(uint32_t start, uint32_t end) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if there is a shortcut when no value is masked at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there is no.
Description
Using croaring library instead of MaskCollection, and using new mask in NodeGroup::scan.
This improves performance by roughly 60% QPS in twitter2010,query like 'match (a:vertex)-[f]-(b) where a.ID=3961 return b'.
In SF100,single query like 'match (a:person {id:24189255912498})-[f*2]-(b) return distinct b limit 20' improve 35%
However,more performance testing is needed.
Contributor agreement