Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manifold version of LargeVis #272

Open
josevalim opened this issue May 30, 2024 · 7 comments
Open

Add manifold version of LargeVis #272

josevalim opened this issue May 30, 2024 · 7 comments
Assignees

Comments

@josevalim
Copy link
Contributor

We already have the building blocks for ANN. @krstopro, how much work do you believe it is to also have it for manifold/dimensionality reduction?

@krstopro
Copy link
Member

krstopro commented May 30, 2024

Probably not much (I still need to see), though I am worried about memory cost of current implementations. LargeVis dimensionality reduction uses large num_neighbors (150 in the paper) and RandomProjectionForest.predict/2 takes $O(QTK)$ memory for number of queries $Q$, number of trees $T$, and number of neighbors $K$. In any case, leave it to me.

By the way, thanks for the shout-out on the blog. :)

@krstopro
Copy link
Member

krstopro commented Jun 8, 2024

Apparently, this is much harder than I thought. One of the issues is sampling the negative edges of the k-NN graph, i.e. edges that are not in the graph. To do this, the authors allocate and iterate over a rather large table. I am not sure about this approach and I don't think I can come up with something efficiently implementable using Nx.

@josevalim
Copy link
Contributor Author

Why do you think it would be hard to implement it? The parr that seems complex is the head traversal, but if we can assume a max depth (not sure if that’s the case), it should be parallelizable.

@krstopro
Copy link
Member

krstopro commented Jun 9, 2024

It's not really about head traversal, more about allocating a table of size 1e8. I'm still checking if this can be avoided.

@krstopro
Copy link
Member

krstopro commented Jun 14, 2024

The problem above might be avoidable, but there is another one that might not be. Namely, the method involves symmetrising the k-NN graph and this requires the computation of reverse k-NN graph which I don't think I can implement efficiently within Nx.
Screenshot 2024-06-14 at 19 03 13

@josevalim
Copy link
Contributor Author

The symmetrized seems to be doable with a while loop over i,j?

@krstopro
Copy link
Member

krstopro commented Jul 21, 2024

Even if it is possible, it would be expensive as we would need to loop over $n \times k$ edges. A bigger issue is how to store the weights. $p_{j | i}$ is $n \times k$ matrix (or sparse $n \times n$ matrix) since we take $k$-nearest neighbors for each of $n$ points; each of $n$ nodes in $k$-NN graph has degree of $k$. $p_{i | j}$ requires reversing the $k$-NN graph and the degree of every node can vary from $0$ (the point is not among $k$-nearest neighbors of any other point) to $n - 1$ (the point is contained among $k$-nearest neighbors of every other point). I cannot think of a way to represent the weights with a tensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants