Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix hnsw param use #1695

Merged
merged 1 commit into from
Jan 8, 2024
Merged

fix hnsw param use #1695

merged 1 commit into from
Jan 8, 2024

Conversation

hermeGarcia
Copy link
Contributor

Description

Describe the proposed changes made in this PR.

How was this PR tested?

Describe how you tested this PR.

Copy link

codecov bot commented Dec 26, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4a7bb24) 82.09% compared to head (eae6515) 82.07%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1695      +/-   ##
==========================================
- Coverage   82.09%   82.07%   -0.03%     
==========================================
  Files         333      333              
  Lines       19400    19400              
==========================================
- Hits        15927    15923       -4     
- Misses       3473     3477       +4     
Flag Coverage Δ
ingest 69.25% <ø> (ø)
node-sidecar 95.62% <ø> (ø)
nucliadb 70.38% <ø> (-0.05%) ⬇️
reader 76.90% <ø> (ø)
sdk 40.46% <ø> (ø)
search 79.24% <ø> (ø)
standalone 88.27% <ø> (ø)
train 63.33% <ø> (ø)
utils 81.24% <ø> (ø)
writer 85.69% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -28,7 +28,7 @@ pub fn level_factor() -> f64 {

/// Upper limit to the number of out-edges a embedding can have.
pub const fn m_max() -> usize {
30
60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What implies increasing this number?

(doubt: why are we using a const fn instead of a const variable?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has several implications both in terms of space and read-write speed. To learn more about its implications I suggest looking at hnsw paper, in particular to the insert algorithm. For our case, we had m and m_max set to the same number, which makes insertion slower (as can be inferred by looking at the algorithm). From what I saw defining m_max as 2*m when m is small is a good approach.

As for why is a const fn and not a constant, I did it for uniformity's sake, since level_factor is a function. That said, I do not mind moving it to a constant.

@hermeGarcia hermeGarcia merged commit d246f90 into main Jan 8, 2024
105 checks passed
@hermeGarcia hermeGarcia deleted the hnsw-better-param-use branch January 8, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants