-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Brute k-NN #257
Brute k-NN #257
Conversation
doc: "The number of nearest neighbors." | ||
], | ||
metric: [ | ||
type: {:or, [{:custom, Scholar.Options, :metric, []}, {:fun, 2}]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps Scholar.Options.metric
should be edited to support functions of arity 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If everywhere we accept a metric we also accept functions, then yes. The easiest is probably to make it so it always returns a 2-arity function and then we call it. :)
&Scholar.Metrics.Distance.cosine/2 | ||
|
||
fun when is_function(fun, 2) -> | ||
fun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure about this. Could there be issues with Nx
backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is fine!
lib/scholar/shared.ex
Outdated
@@ -89,4 +89,25 @@ defmodule Scholar.Shared do | |||
|
|||
valid_broadcast(to_parse - 1, n_dims, shape1, shape2) | |||
end | |||
|
|||
defn get_batches(tensor, opts) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot think of a better name here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it going to be used in other places? If so, where? Maybe keep it on BruteKNN until something else needs it? Btw, should this PR remove the current KNearestNeighbours module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it going to be used in other places? If so, where?
Not 100% sure, but I strongly believe we could use it for incremental PCA (#246, task 1).
Btw, should this PR remove the current KNearestNeighbours module?
KNearestNeighbors
is a supervised model; it performs classification or regression. I would keep it until we implement KNNClassifier
and KNNRegressor
(task 2 and 3 of #255).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% sure, but I strongly believe we could use it for incremental PCA (#246, task 1).
So I would keep it in the BruteKNN module. We extract it when there is a use case.
KNearestNeighbors is a supervised model; it performs classification or regression. I would keep it until we implement KNNClassifier and KNNRegressor (task 2 and 3 of #255).
Ok!
opts = NimbleOptions.validate!(opts, @opts_schema) | ||
|
||
metric = | ||
case opts[:metric] do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up above: I would move this normalization to Scholar.Options.metric then. This way everywhere can rely on it being a 2-arity function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean instead of returning the atom to return the anonymous function of arity 2?
I agree, but some other modules need to be edited as well. For example k-d tree:
scholar/lib/scholar/neighbors/kd_tree.ex
Lines 33 to 43 in 0d1bcc1
metric: [ | |
type: {:custom, Scholar.Options, :metric, []}, | |
default: {:minkowski, 2}, | |
doc: ~S""" | |
Name of the metric. Possible values: | |
* `{:minkowski, p}` - Minkowski metric. By changing value of `p` parameter (a positive number or `:infinity`) | |
we can set Manhattan (`1`), Euclidean (`2`), Chebyshev (`:infinity`), or any arbitrary $L_p$ metric. | |
* `:cosine` - Cosine metric. | |
""" |
And here is how it is used there:
scholar/lib/scholar/neighbors/kd_tree.ex
Lines 283 to 286 in 0d1bcc1
case opts[:metric] do | |
{:minkowski, 2} -> Distance.squared_euclidean(x1, x2) | |
{:minkowski, p} -> Distance.minkowski(x1, x2, p: p) | |
:cosine -> Distance.cosine(x1, x2) |
I would honestly prefer metric to be stored as a function inside a field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem is that different algorithms support different metrics. Brute-force search works with literally any metric, while current implementation of random projection forest works only with the Euclidean distance. I am not sure about k-d tree; I think it works only with the three metrics specified in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine then to keep this logic only in this module then :) IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I would still suggest converting atoms to functions and storing them as fields. Seems cleaner and doesn't require choosing the right function to call each time predict
is used.
Also, it might be worth considering removing Scholar.Options.metric
given the differences in metrics supported between modules.
I guess this is ready to merge. ^_^ |
Implements brute-force k-NN search algorithm.
Closes #239.