Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brute k-NN #257

Merged
merged 6 commits into from
Apr 16, 2024
Merged

Brute k-NN #257

merged 6 commits into from
Apr 16, 2024

Conversation

krstopro
Copy link
Member

Implements brute-force k-NN search algorithm.

Closes #239.

@krstopro krstopro mentioned this pull request Apr 13, 2024
doc: "The number of nearest neighbors."
],
metric: [
type: {:or, [{:custom, Scholar.Options, :metric, []}, {:fun, 2}]},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Scholar.Options.metric should be edited to support functions of arity 2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everywhere we accept a metric we also accept functions, then yes. The easiest is probably to make it so it always returns a 2-arity function and then we call it. :)

&Scholar.Metrics.Distance.cosine/2

fun when is_function(fun, 2) ->
fun
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure about this. Could there be issues with Nx backends?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is fine!

@@ -89,4 +89,25 @@ defmodule Scholar.Shared do

valid_broadcast(to_parse - 1, n_dims, shape1, shape2)
end

defn get_batches(tensor, opts) do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot think of a better name here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to be used in other places? If so, where? Maybe keep it on BruteKNN until something else needs it? Btw, should this PR remove the current KNearestNeighbours module?

Copy link
Member Author

@krstopro krstopro Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to be used in other places? If so, where?

Not 100% sure, but I strongly believe we could use it for incremental PCA (#246, task 1).

Btw, should this PR remove the current KNearestNeighbours module?

KNearestNeighbors is a supervised model; it performs classification or regression. I would keep it until we implement KNNClassifier and KNNRegressor (task 2 and 3 of #255).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sure, but I strongly believe we could use it for incremental PCA (#246, task 1).

So I would keep it in the BruteKNN module. We extract it when there is a use case.

KNearestNeighbors is a supervised model; it performs classification or regression. I would keep it until we implement KNNClassifier and KNNRegressor (task 2 and 3 of #255).

Ok!

opts = NimbleOptions.validate!(opts, @opts_schema)

metric =
case opts[:metric] do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up above: I would move this normalization to Scholar.Options.metric then. This way everywhere can rely on it being a 2-arity function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean instead of returning the atom to return the anonymous function of arity 2?
I agree, but some other modules need to be edited as well. For example k-d tree:

metric: [
type: {:custom, Scholar.Options, :metric, []},
default: {:minkowski, 2},
doc: ~S"""
Name of the metric. Possible values:
* `{:minkowski, p}` - Minkowski metric. By changing value of `p` parameter (a positive number or `:infinity`)
we can set Manhattan (`1`), Euclidean (`2`), Chebyshev (`:infinity`), or any arbitrary $L_p$ metric.
* `:cosine` - Cosine metric.
"""

And here is how it is used there:
case opts[:metric] do
{:minkowski, 2} -> Distance.squared_euclidean(x1, x2)
{:minkowski, p} -> Distance.minkowski(x1, x2, p: p)
:cosine -> Distance.cosine(x1, x2)

I would honestly prefer metric to be stored as a function inside a field.

Copy link
Member Author

@krstopro krstopro Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem is that different algorithms support different metrics. Brute-force search works with literally any metric, while current implementation of random projection forest works only with the Euclidean distance. I am not sure about k-d tree; I think it works only with the three metrics specified in the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine then to keep this logic only in this module then :) IMO

Copy link
Member Author

@krstopro krstopro Apr 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I would still suggest converting atoms to functions and storing them as fields. Seems cleaner and doesn't require choosing the right function to call each time predict is used.
Also, it might be worth considering removing Scholar.Options.metric given the differences in metrics supported between modules.

@krstopro
Copy link
Member Author

I guess this is ready to merge. ^_^

@krstopro krstopro merged commit 5c1786f into elixir-nx:main Apr 16, 2024
2 checks passed
@krstopro krstopro mentioned this pull request May 12, 2024
@krstopro krstopro deleted the brute-knn branch May 15, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement batch version of brute-force k-NN search
2 participants