-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] TSNE and UMAP: Allow input to be precomputed distance matrix #4799
Comments
@RichieHakim did you try using the |
@divyegala |
Could you be more specific about what doesn't work for you? Is it with a specific metric? Do you have a crash or simply bad looking results? Here is an example for UMAP :
|
After reading some of your posts again, I think that I could get your point. There is indeed some issue in the way the Python API for UMAP is currently designed. But, this should however not prevent users from using the precomputed KNN graph feature. Indeed, when using the
Created PR #4865 |
Unfortunately, the choice to not support the As far as the correctness of the existing
HDBSCAN does not accept a |
@RichieHakim Looking through some of your other issues, I think I understand now what you are looking for- you already have pairwise distance matrix (and we might not be able to assume here that you have access to the original data), and you'd like to be able to invoke UMAP and TSNE w/ it. Can you give us an idea of the size of this pairwise distance matrix? While we work to prioritize this feature, would it be helpful if you were able to convert your pairwise distance matrix into a valid knn graph by using something like cupy to do a sort/argsort and then use the |
This issue has been labeled |
Is your feature request related to a problem? Please describe.
Yes. Currently, TSNE and UMAP do not actually allow for a precomputed distance matrix to be used. This is despite there being arguments that suggest this functionality exists.
Describe the solution you'd like
Implementation of methods to allow for a custom distance matrix to be used as the sole data input to TSNE and/or UMAP. This functionality currently exists in sklearn and would allow for users already using this feature in sklearn's TSNE to directly port their workflow to rapids.
https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
metricstr or callable, default=’euclidean’: ...If metric is “precomputed”, X is assumed to be a distance matrix. ...
Describe alternatives you've considered
What currently exists is the
knn_graph
argument for the.fit
method. The documentation suggests that allowing for a precomputed distance matrix to be used by providing "a sparse array containing the k-nearest neighbors" should be possible, but this method is not functional.Additional context
This is a highly desirable feature.
Similar requests here, here, here
Thanks!
The text was updated successfully, but these errors were encountered: