Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tsvd fix #3

Merged
merged 79 commits into from
Oct 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
6438349
Remove redundant sorting (#220)
msluszniak Dec 14, 2023
99a5286
Fix cubic spline when x isn't sorted (#219)
msluszniak Dec 14, 2023
f651fdc
Standard Scaler fit-transform interface (#179)
santiago-imelio Dec 14, 2023
c82c22a
Fix benchmarks
josevalim Dec 11, 2023
b9ff291
Simplify standard_scale delegate
josevalim Dec 15, 2023
e8fee47
Only provide defn version of KDTree
josevalim Dec 18, 2023
f3b382b
Remove unused second argument
josevalim Dec 18, 2023
2a0ef38
No need for stable sorting once unique
josevalim Dec 18, 2023
6c60968
Replace second order argsort with permutation inverse (#223)
jonatanklosko Dec 18, 2023
00f3de3
Add other non-encoding preprocessing utilities as separate modules (#…
msluszniak Dec 20, 2023
d030f00
Add one line summaries back
josevalim Dec 20, 2023
410ced4
Improve docs
josevalim Dec 20, 2023
d541516
Add Random Projection Forests (#215)
krstopro Dec 24, 2023
9819798
Introduce encoders in separate modules (#225)
msluszniak Dec 29, 2023
64d1840
Add normalizer (#227)
msluszniak Dec 29, 2023
0e99c60
Move iota inside loop
josevalim Dec 31, 2023
58ee5e3
Upgrade to tucan 0.3.0 (#229)
pnezis Jan 3, 2024
db9efd7
Random Projection Forest improvements (#231)
krstopro Jan 16, 2024
cd64e15
LargeVis (#232)
krstopro Jan 21, 2024
4dccc0a
Add Mean Pinball Loss function (#235)
JoaquinIglesiasTurina Mar 4, 2024
ce39654
Nn descent (#233)
msluszniak Mar 5, 2024
fe5be67
Update deps and changelog (#240)
josevalim Mar 7, 2024
c5614a9
Trimap (#236)
msluszniak Mar 8, 2024
e037b24
Update README.md (#243)
lkarthee Mar 11, 2024
0619dc5
Update NNDescent (#245)
msluszniak Apr 5, 2024
c314152
Multinomial Naive Bayes Improvements (#248)
krstopro Apr 7, 2024
0d1bcc1
Added ndcg metric (#251)
norm4nn Apr 7, 2024
5c1786f
Brute k-NN (#257)
krstopro Apr 16, 2024
15c2eb5
Add det curve (#258)
srzeszut Apr 17, 2024
f64e65a
Add bayesian ridge (#247)
JoaquinIglesiasTurina Apr 21, 2024
3e83bbb
Fix bug with floating point data (#261)
msluszniak Apr 24, 2024
eb63b68
Update maintainers and ExDoc
josevalim Apr 25, 2024
70a85ff
Properly cast mode to u8 in KDTree
josevalim Apr 25, 2024
9a931aa
Update erts on CI
josevalim Apr 25, 2024
06e43dd
Use functions for constants
josevalim Apr 27, 2024
2489482
Improvements to notebook titles
josevalim May 7, 2024
e0e92d0
Add livebook (#262)
msluszniak May 8, 2024
ebdae8f
Output distances in kdtree (#264)
msluszniak May 13, 2024
96c4e5b
Remove unecessary type merges
josevalim May 14, 2024
ffaac87
K-NN Classifier (#263)
krstopro May 14, 2024
f358b24
Unify neighbors metrics
josevalim May 14, 2024
582b220
Add a test for custom metric on BruteKNN
josevalim May 14, 2024
322687a
Rename predict_proba to predict_probability
josevalim May 14, 2024
74ed5fe
K-NN Regressor (#268)
krstopro May 16, 2024
09d500a
Unify weight handling and refactor linear models' helper functions (#…
JoaquinIglesiasTurina May 16, 2024
af3b8eb
Update knn notebooks (#269)
msluszniak May 28, 2024
422cfea
Update notebooks
josevalim May 28, 2024
92cb3d2
More notebook fixes
josevalim May 28, 2024
1e4a01c
More updates
josevalim May 28, 2024
0b8214e
Use latest ExDoc
josevalim May 28, 2024
d3a4f64
Add pending modules to sidebar
josevalim May 28, 2024
2493110
Update CHANGELOG
josevalim May 28, 2024
e1897fa
Rename `RandomForestTree` to `RandomProjectionForest` in CHANGELOG.md…
krstopro May 28, 2024
0698d44
Update NN files (#271)
msluszniak May 28, 2024
a66a3ad
Release v0.3.0
josevalim May 29, 2024
6fdf3a7
Update ExDoc
josevalim May 30, 2024
32a5b56
use Nx.BinaryBackend for cv notebook (#274)
santiago-imelio Jun 1, 2024
341301b
Added d2_pinball_score and d2_absolute_error_score (#277)
norm4nn Jun 7, 2024
09c5ac6
Update EXGBoost version in notebook (#279)
acalejos Jun 8, 2024
433041f
Make nn algorithm configurable (#281)
msluszniak Jun 14, 2024
accb6b7
Manifold learning notebooks (#278)
msluszniak Jun 16, 2024
fbe2089
Update mix.exs (#282)
msluszniak Jun 18, 2024
e0ada5e
Release v0.3.1
josevalim Jun 18, 2024
d570f48
Remove Tucan.layers/1 with a single layer (#283)
jonatanklosko Jun 20, 2024
8712e96
Fix typo in dimensionality reduction notebook (#285)
krstopro Jun 24, 2024
e08a802
Fix various typos and improve language (#292)
preciz Jul 30, 2024
66ec4c8
Add partial_fit/2 and incremental_fit/2 to PCA (#291)
krstopro Jul 30, 2024
e8a45a3
Fix/linear shapes (#288)
JoaquinIglesiasTurina Jul 30, 2024
7050d32
Improvements to OrdinalEncoder, OneHotEncoder, NaiveBayes, LogisticRe…
krstopro Aug 1, 2024
2dca4aa
Bug fix. (#299)
krstopro Sep 4, 2024
1765930
Split RadiusNearestNeighbors module into RNNClassifier and RNNRegress…
norm4nn Sep 10, 2024
4dec8ba
Fix doctests by avoiding nesting #Nx.Tensor
josevalim Sep 10, 2024
017e29b
RNN -> RadiusNN
josevalim Sep 10, 2024
cea4657
Add OPTICS clustering algorithm (#295)
norm4nn Sep 12, 2024
975938a
Add batching to regression metrics (#297)
norm4nn Sep 13, 2024
2a601cc
Add TruncatedSVD module (#302)
norm4nn Oct 4, 2024
157beb8
fixed tsvd bug
norm4nn Oct 26, 2024
9e7c645
added test
norm4nn Oct 27, 2024
1231c13
mix format
norm4nn Oct 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
otp: "26.1"
lint: true
- elixir: "1.14.5"
otp: "25.3"
otp: "26.1"
steps:
- uses: actions/checkout@v2

Expand Down
35 changes: 33 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,37 @@
# Changelog

## v0.2.2-dev
## v0.3.1 (2024-06-18)

### Enhancements

* Add a notebook about manifold learning
* Make knn algorithm configurable on Trimap
* Add `d2_pinball_score` and `d2_absolute_error_score`

## v0.3.0 (2024-05-29)

### Enhancements

* Add LargeVis for visualization of large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space
* Add `Scholar.Neighbors.KDTree` and `Scholar.Neighbors.RandomProjectionForest`
* Add `Scholar.Metrics.Neighbors`
* Add `Scholar.Linear.BayesianRidgeRegression`
* Add `Scholar.Cluster.Hierarchical`
* Add `Scholar.Manifold.Trimap`
* Add Mean Pinball Loss function
* Add Matthews Correlation Coefficient function
* Add D2 Tweedie Score function
* Add Mean Tweedie Deviance function
* Add Discounted Cumulative Gain function
* Add Precision Recall f-score function
* Add f-beta score function
* Add convergence check to AffinityPropagation
* Default Affinity Propagation preference to `reduce_min` and make it customizable
* Move preprocessing functionality to their own modules with `fit` and `fit_transform` callbacks

### Breaking changes

* Split `KNearestNeighbors` into `KNNClassifier` and `KNNRegressor` with custom algorithm support

## v0.2.1 (2023-08-30)

Expand All @@ -21,7 +52,7 @@ This version requires Elixir v1.14+.
* Add `t-SNE`
* Add `Polynomial Regression`
* Replace seeds with `Random.key`
* Add 'unrolling loops' option
* Add 'unrolling loops' option
* Add support for custom optimizers in `Logistic Regression`
* Add `Trapezoidal Integration`
* Add `AUC-ROC`, `AUC`, and `ROC Curve`
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Add to your `mix.exs`:
```elixir
def deps do
[
{:scholar, "~> 0.2.1"}
{:scholar, "~> 0.3.0"}
]
end
```
Expand All @@ -34,7 +34,7 @@ such as EXLA:
```elixir
def deps do
[
{:scholar, "~> 0.2.1"},
{:scholar, "~> 0.3.0"},
{:exla, ">= 0.0.0"}
]
end
Expand Down Expand Up @@ -64,12 +64,12 @@ To use Scholar inside code notebooks, run:

```elixir
Mix.install([
{:scholar, "~> 0.2.1"},
{:scholar, "~> 0.3.0"},
{:exla, ">= 0.0.0"}
])

Nx.global_default_backend(EXLA.Backend)
# Client can also be set to :cuda / :romc
# Client can also be set to :cuda / :rocm
Nx.Defn.global_default_options(compiler: EXLA, client: :host)
```

Expand Down
15 changes: 0 additions & 15 deletions benchmarks/kd_tree.exs

This file was deleted.

9 changes: 7 additions & 2 deletions benchmarks/knn.exs
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,16 @@ inputs_knn = %{
Benchee.run(
%{
"kdtree" => fn x ->
kdtree = Scholar.Neighbors.KDTree.fit_bounded(x, Nx.axis_size(x, 0))
kdtree = Scholar.Neighbors.KDTree.fit(x)
Scholar.Neighbors.KDTree.predict(kdtree, x, k: 4)
end,
"brute force knn" => fn x ->
model = Scholar.Neighbors.KNearestNeighbors.fit(x, Nx.broadcast(1, {Nx.axis_size(x, 0)}), num_classes: 2, num_neighbors: 4)
model =
Scholar.Neighbors.KNearestNeighbors.fit(x, Nx.broadcast(1, {Nx.axis_size(x, 0)}),
num_classes: 2,
num_neighbors: 4
)

Scholar.Neighbors.KNearestNeighbors.k_neighbors(model, x)
end
},
Expand Down
8 changes: 4 additions & 4 deletions lib/scholar/cluster/affinity_propagation.ex
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
defmodule Scholar.Cluster.AffinityPropagation do
@moduledoc """
Model representing affinity propagation clustering. The first dimension
of `:clusters_centers` is set to the number of samples in the dataset.
The artificial centers are filled with `:infinity` values. To fillter
them out use `prune` function.
of `:cluster_centers` is set to the number of samples in the dataset.
The artificial centers are filled with `:infinity` values. To filter
them out use the `prune` function.

The algorithm has a time complexity of the order $O(N^2T)$, where $N$ is
the number of samples and $T$ is the number of iterations until convergence.
Expand Down Expand Up @@ -91,7 +91,7 @@ defmodule Scholar.Cluster.AffinityPropagation do

The function returns a struct with the following parameters:

* `:clusters_centers` - Cluster centers from the initial data.
* `:cluster_centers` - Cluster centers from the initial data.

* `:cluster_centers_indices` - Indices of cluster centers.

Expand Down
18 changes: 10 additions & 8 deletions lib/scholar/cluster/dbscan.ex
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,17 @@ defmodule Scholar.Cluster.DBSCAN do
type: :integer
],
metric: [
type: {:custom, Scholar.Options, :metric, []},
default: {:minkowski, 2},
type: {:custom, Scholar.Neighbors.Utils, :pairwise_metric, []},
default: &Scholar.Metrics.Distance.pairwise_minkowski/2,
doc: ~S"""
Name of the metric. Possible values:
The function that measures the pairwise distance between two points. Possible values:

* `{:minkowski, p}` - Minkowski metric. By changing value of `p` parameter (a positive number or `:infinity`)
we can set Manhattan (`1`), Euclidean (`2`), Chebyshev (`:infinity`), or any arbitrary $L_p$ metric.
we can set Manhattan (`1`), Euclidean (`2`), Chebyshev (`:infinity`), or any arbitrary $L_p$ metric.

* `:cosine` - Cosine metric.

* Anonymous function of arity 2 that takes two rank-2 tensors.
"""
],
weights: [
Expand Down Expand Up @@ -96,17 +98,17 @@ defmodule Scholar.Cluster.DBSCAN do
y_dummy = Nx.broadcast(Nx.tensor(0), {num_samples})

neighbor_model =
Scholar.Neighbors.RadiusNearestNeighbors.fit(x, y_dummy,
Scholar.Neighbors.RadiusNNClassifier.fit(x, y_dummy,
num_classes: 1,
radius: opts[:eps],
metric: opts[:metric]
)

{_dist, indices} =
Scholar.Neighbors.RadiusNearestNeighbors.radius_neighbors(neighbor_model, x)
Scholar.Neighbors.RadiusNNClassifier.radius_neighbors(neighbor_model, x)

n_neigbors = Nx.sum(indices * weights, axes: [1])
core_samples = n_neigbors >= opts[:min_samples]
n_neighbors = Nx.sum(indices * weights, axes: [1])
core_samples = n_neighbors >= opts[:min_samples]
labels = dbscan_inner(core_samples, indices)

%__MODULE__{
Expand Down
3 changes: 1 addition & 2 deletions lib/scholar/cluster/hierarchical.ex
Original file line number Diff line number Diff line change
Expand Up @@ -158,8 +158,7 @@ defmodule Scholar.Cluster.Hierarchical do

dendrogram_fun =
case linkage do
# TODO: :centroid, :median
# TODO: :ward
# TODO: :centroid, :median, :ward
l when l in [:average, :complete, :single, :weighted] ->
&parallel_nearest_neighbor/3
end
Expand Down
2 changes: 1 addition & 1 deletion lib/scholar/cluster/k_means.ex
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ defmodule Scholar.Cluster.KMeans do
@moduledoc """
K-Means Algorithm.

K-Means is simple clustering method that works iteratively [1]. In the first iteration,
K-Means is a simple clustering method that works iteratively [1]. In the first iteration,
centroids are chosen randomly from input data. It turned out that some initializations
are especially effective. In 2007 David Arthur and Sergei Vassilvitskii proposed initialization
called k-means++ which speed up convergence of algorithm drastically [2]. After initialization, from each centroid
Expand Down
Loading
Loading