Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using croaring library instead of MaskCollection #4356

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ted-wq-x
Copy link
Contributor

@ted-wq-x ted-wq-x commented Oct 9, 2024

Description

Using croaring library instead of MaskCollection, and using new mask in NodeGroup::scan.
This improves performance by roughly 60% QPS in twitter2010,query like 'match (a:vertex)-[f]-(b) where a.ID=3961 return b'.

In SF100,single query like 'match (a:person {id:24189255912498})-[f*2]-(b) return distinct b limit 20' improve 35%

However,more performance testing is needed.

Contributor agreement

@ray6080 ray6080 requested review from ray6080 and andyfengHKU and removed request for benjaminwinger October 11, 2024 03:04
Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.
This is a very cool optimization we should've done long time ago 😄

}
maskData = std::make_unique<MaskData>(maxOffset + 1);
}
explicit RoaringBitMapSemiMask(common::table_id_t tableID, common::offset_t maxOffset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, bitmap should be a single word. so can u rename this to RoaringBitmapSemiMask?

struct RoaringBitMapSemiMaskUtil {
static std::shared_ptr<RoaringBitMapSemiMask> createRoaringBitMapSemiMask(
common::table_id_t tableID, common::offset_t maxOffset) {
if (maxOffset > std::numeric_limits<u_int32_t>::max()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (maxOffset > std::numeric_limits<u_int32_t>::max()) {
if (maxOffset > std::numeric_limits<uint32_t>::max()) {

@@ -36,7 +36,7 @@ class ScanNodeTableSharedState {
common::node_group_idx_t currentUnCommittedGroupIdx;
common::node_group_idx_t numCommittedNodeGroups;
common::node_group_idx_t numUnCommittedNodeGroups;
std::unique_ptr<common::NodeVectorLevelSemiMask> semiMask;
std::shared_ptr<common::RoaringBitMapSemiMask> semiMask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should switch to shared_ptr here. I don't see clear win or loss of using shared_ptr here except that it avoids passing raw pointers around while making the life cycle of semiMask less obvious. @andyfengHKU should also comment here.

}

// include&exclude
std::vector<common::offset_t> range(uint32_t start, uint32_t end) override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if there is a shortcut when no value is masked at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is no.

@ted-wq-x ted-wq-x requested a review from ray6080 October 16, 2024 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants