Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot return the results in a contigious 2D array. Probably ef or M is too small #373

Open
prateekpatelsc opened this issue Feb 16, 2022 · 3 comments · May be fixed by #513
Open

Cannot return the results in a contigious 2D array. Probably ef or M is too small #373

prateekpatelsc opened this issue Feb 16, 2022 · 3 comments · May be fixed by #513

Comments

@prateekpatelsc
Copy link

@yurymalkov : trying to understand when this can happen
I have an index where i have few hundred thousand elements , no deletions.
my topK is around range 100-500,, search_ef is ~400

Could you please elaborate in what scenarios can the algorithm run into such case ? what is the correct way to handle this
increasing M eads to increase in index sizes and slow search times , so i am not too inclined to go this route .

@prateekpatelsc
Copy link
Author

Is this of because stuck in some local minima where no neighbors are improving distance and the search queue is empty ?

@prateekpatelsc
Copy link
Author

Also is this somehow related to data dimensionality as well ? for example for a fixed M and search ef , ef_construction param , the chances of running into these kind of errors are more probable large high dimensional data or similar scale data but lower dimension

@yurymalkov
Copy link
Member

Hi @prateekpatelsc,

That might be connected to duplicates in the data. The duplicates are useless for search as each component can be substituted with a single element decreasing the index size and complexity (and one can control them on the client side doing a search before the insertion of a batch) and incremental graph-based approaches can suffer from them due to loss of connectivity in the graph if the size of a maximum duplicate component is much larger than the number of links M (and it is hard to check for duplicates for inner product inside the algorithm).

Can you check if it has large components of duplicates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants