Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For a 0.20.4 release #1120

Merged
merged 46 commits into from
May 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
a89faf7
Use repl language tag for sample
abhro Apr 22, 2024
8e45385
Update language tags for code samples
abhro Apr 22, 2024
fddc289
Follow blue style in docs/src/working_with_categorical_data.md
abhro Apr 24, 2024
3d6d15f
Update mlj_cheatsheet.md
abhro Apr 29, 2024
ae28151
Consistenly use @example in common_mlj_workflows.md
abhro Apr 30, 2024
9f274ad
Fix @example namespace in common workflows
abhro May 3, 2024
367db46
Break up predicting transformers into separate @example blocks
abhro May 3, 2024
f86b01b
Use @example instead of pre-built repl sample in learning_networks.md
abhro May 3, 2024
dc71382
Merge branch 'dev' into patch-1
abhro May 3, 2024
166a6f2
remove examples/telco
ablaom May 5, 2024
e341344
add DFKI logo to list of sponsors (on README.md)
ablaom May 6, 2024
972e018
Merge pull request #1114 from JuliaAI/remove-telco-example
ablaom May 6, 2024
cda789b
update ORGANIZATION.md
ablaom May 8, 2024
a719bd3
doc tweak
ablaom May 8, 2024
18ba90f
suppress model-generated warnings in integration tests
ablaom May 9, 2024
ce4bce2
Merge branch 'dev' into patch-1
abhro May 11, 2024
c1c7288
some tidy up
ablaom May 12, 2024
f582a2e
add progress meter
ablaom May 13, 2024
946eac2
re-instate some models in integration tests
ablaom May 13, 2024
47bc47d
fix scope issue
ablaom May 13, 2024
b0bd180
fix progress meter
ablaom May 13, 2024
ccd5847
tweak
ablaom May 13, 2024
fc7c2cd
tweak
ablaom May 14, 2024
211bcf9
Do mechanical fixes of spacing, semicolons, and punc
abhro May 15, 2024
c7b5d3a
Fix indentation of markdown line
abhro May 15, 2024
925ec42
Move hidden example block to setup
abhro May 15, 2024
2a1202f
Pull code sample into list
abhro May 15, 2024
f8518f4
Use proper markdown lists
abhro May 15, 2024
ad9129b
Use example block for workflows
abhro May 15, 2024
da2e45a
Remove lambdas
abhro May 15, 2024
72f2be2
Use repl blocks for user defined models
abhro May 15, 2024
c24a96b
Use bigger fences for cheatsheet code
abhro May 15, 2024
331bac8
Promote headers in cheatsheet
abhro May 15, 2024
0acc876
Use Clustering.jl instead of ParallelKMeans
abhro May 15, 2024
dcf6a21
tweak
ablaom May 15, 2024
18e9c9f
Remove unsupported use of info() from cheatsheet
abhro May 15, 2024
739ca21
Remove comments to have not as wide code lines
abhro May 15, 2024
f211322
Add description of data coercion in cheatsheet
abhro May 15, 2024
d079644
Update docs/src/mlj_cheatsheet.md
abhro May 15, 2024
650ebbd
Remove other occurence of `info` on measure
abhro May 16, 2024
aa0cf90
Upgrading MLJFlow.jl to v0.4.2
pebeto May 18, 2024
bd08451
Merge pull request #1115 from JuliaAI/tweak-to-integration-tests
ablaom May 19, 2024
2745563
Merge pull request #1107 from abhro/patch-1
ablaom May 19, 2024
b8d19dc
Merge pull request #1118 from JuliaAI/upgrading_mljflow_to_v0.5.1
ablaom May 20, 2024
5e909f5
doc tweak
ablaom May 20, 2024
6a57430
bump 0.20.4
ablaom May 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 63 additions & 80 deletions ORGANIZATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,102 +8,85 @@ connections do not currently exist but are planned/proposed.*
Repositories of some possible interest outside of MLJ, or beyond
its conventional use, are marked with a ⟂ symbol:

* [MLJ.jl](https://github.com/JuliaAI/MLJ.jl) is the
general user's point-of-entry for choosing, loading, composing,
evaluating and tuning machine learning models. It pulls in most code
from other repositories described below. MLJ also hosts the [MLJ
manual](src/docs) which documents functionality across the
repositories, with the exception of ScientificTypesBase, and
MLJScientific types which host their own documentation. (The MLJ
manual and MLJTutorials do provide overviews of scientific types.)

* [MLJModelInterface.jl](https://github.com/JuliaAI/MLJModelInterface.jl)
is a lightweight package imported by packages implementing MLJ's
interface for their machine learning models. It's only dependencies
are ScientificTypesBase.jl (which depends only on the standard
library module `Random`) and
[StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl)
(which depends only on ScientificTypesBase.jl).
* [MLJ.jl](https://github.com/JuliaAI/MLJ.jl) is the general user's point-of-entry for
choosing, loading, composing, evaluating and tuning machine learning models. It pulls in
most code from other repositories described below. MLJ also hosts the [MLJ
manual](src/docs) which documents functionality across the repositories, although some
pages point to documentation hosted locally by a particular package.


* [MLJModelInterface.jl](https://github.com/JuliaAI/MLJModelInterface.jl) is a lightweight
package imported by packages implementing MLJ's interface for their machine learning
models. It's only dependencies are ScientificTypesBase.jl (which depends only on the
standard library module `Random`) and
[StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl) (which depends
only on ScientificTypesBase.jl).

* (⟂)
[MLJBase.jl](https://github.com/JuliaAI/MLJBase.jl) is
a large repository with two main purposes: (i) to give "dummy"
methods defined in MLJModelInterface their intended functionality
(which depends on third party packages, such as
* (⟂) [MLJBase.jl](https://github.com/JuliaAI/MLJBase.jl) is a large repository with two
main purposes: (i) to give "dummy" methods defined in MLJModelInterface their intended
functionality (which depends on third party packages, such as
[Tables.jl](https://github.com/JuliaData/Tables.jl),
[Distributions.jl](https://github.com/JuliaStats/Distributions.jl)
and
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl));
and (ii) provide functionality essential to the MLJ user that has
not been relegated to its own "satellite" repository for some
reason. See the [MLJBase.jl
readme](https://github.com/JuliaAI/MLJBase.jl) for a
detailed description of MLJBase's contents.
[Distributions.jl](https://github.com/JuliaStats/Distributions.jl) and
[CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl)); and (ii)
provide functionality essential to the MLJ user that has not been relegated to its own
"satellite" repository for some reason. See the [MLJBase.jl
readme](https://github.com/JuliaAI/MLJBase.jl) for a detailed description of MLJBase's
contents.

* [StatisticalMeasures.jl](https://github.com/JuliaAI/StatisticalMeasures.jl) provifes
* [StatisticalMeasures.jl](https://github.com/JuliaAI/StatisticalMeasures.jl) provides
performance measures (metrics) such as losses and scores.

* [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl)
hosts the *MLJ model registry*, which contains metadata on all the
models the MLJ user can search and load from MLJ. Moreover, it
provides the functionality for **loading model code** from MLJ on
demand. Finally, it furnishes some commonly used transformers for
data pre-processing, such as `ContinuousEncoder` and `Standardizer`.
* [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl) hosts the *MLJ model registry*,
which contains metadata on all the models the MLJ user can search and load from
MLJ. Moreover, it provides the functionality for **loading model code** from MLJ on
demand. Finally, it furnishes some commonly used transformers for data pre-processing,
such as `ContinuousEncoder` and `Standardizer`.

* [MLJTuning.jl](https://github.com/JuliaAI/MLJTuning.jl)
provides MLJ's `TunedModel` wrapper for hyper-parameter
optimization, including the extendable API for tuning strategies,
and selected in-house implementations, such as `Grid` and
`RandomSearch`.
* [MLJTuning.jl](https://github.com/JuliaAI/MLJTuning.jl) provides MLJ's `TunedModel`
wrapper for hyper-parameter optimization, including the extendable API for tuning
strategies, and selected in-house implementations, such as `Grid` and `RandomSearch`.

* [MLJEnsembles.jl](https://github.com/JuliaAI/MLJEnsembles.jl)
provides MLJ's `EnsembleModel` wrapper, for creating homogenous
model ensembles.
* [MLJEnsembles.jl](https://github.com/JuliaAI/MLJEnsembles.jl) provides MLJ's
`EnsembleModel` wrapper, for creating homogeneous model ensembles.

* [MLJIteration.jl](https://github.com/JuliaAI/MLJIteration.jl)
provides the `IteratedModel` wrapper for controlling iterative
models (snapshots, early stopping criteria, etc)
* [MLJIteration.jl](https://github.com/JuliaAI/MLJIteration.jl) provides the
`IteratedModel` wrapper for controlling iterative models (snapshots, early stopping
criteria, etc)

* (⟂)
[OpenML.jl](https://github.com/JuliaAI/OpenML.jl) provides
integration with the [OpenML](https://www.openml.org) data science
exchange platform
* [MLJFlow.jl](https://github.com/JuliaAI/MLJFlow.jl) provides integration with the
platform-agnostic machine learning tracking tool [MLflow](https://mlflow.org).

* (⟂)
[MLJLinearModels.jl](https://github.com/JuliaAI/MLJLinearModels.jl)
is an experimental package for a wide range of julia-native penalized linear models
such as Lasso, Elastic-Net, Robust regression, LAD regression,
etc.
* (⟂) [OpenML.jl](https://github.com/JuliaAI/OpenML.jl) provides integration with the
[OpenML](https://www.openml.org) data science exchange platform

* (⟂) [MLJLinearModels.jl](https://github.com/JuliaAI/MLJLinearModels.jl) provides a wide
range of julia-native penalized linear models such as Lasso, Elastic-Net, Robust
regression, LAD regression, etc.

* [MLJFlux.jl](https://github.com/FluxML/MLJFlux.jl) an experimental
package for gradient-descent models, such as traditional
neural-networks, built with
* [MLJFlux.jl](https://github.com/FluxML/MLJFlux.jl) an experimental package for
gradient-descent models, such as traditional neural-networks, built with
[Flux.jl](https://github.com/FluxML/Flux.jl), in MLJ.

* (⟂)
[ScientificTypesBase.jl](https://github.com/JuliaAI/ScientificTypesBase.jl)
is an ultra lightweight package providing "scientific" types,
such as `Continuous`, `OrderedFactor`, `Image` and `Table`. It's
purpose is to formalize conventions around the scientific
interpretation of ordinary machine types, such as `Float32` and
* (⟂) [ScientificTypesBase.jl](https://github.com/JuliaAI/ScientificTypesBase.jl) is an
ultra lightweight package providing "scientific" types, such as `Continuous`,
`OrderedFactor`, `Image` and `Table`. It's purpose is to formalize conventions around
the scientific interpretation of ordinary machine types, such as `Float32` and
`DataFrame`.

* (⟂)
[ScientificTypes.jl](https://github.com/JuliaAI/ScientificTypes.jl)
articulates the particular convention for the scientific interpretation of
data that MLJ adopts
* (⟂) [ScientificTypes.jl](https://github.com/JuliaAI/ScientificTypes.jl) articulates the
particular convention for the scientific interpretation of data that MLJ adopts

* (⟂)
[StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl)
An ultra lightweight package defining fall-back implementations for
a collection of traits possessed by statistical objects, principally
models and measures (metrics).
* (⟂) [StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl) An ultra
lightweight package defining fall-back implementations for a collection of traits
possessed by statistical objects, principally models and measures (metrics).

* (⟂)
[DataScienceTutorials](https://github.com/JuliaAI/DataScienceTutorials.jl)
collects tutorials on how to use MLJ, which are deployed
* (⟂) [DataScienceTutorials](https://github.com/JuliaAI/DataScienceTutorials.jl) collects
tutorials on how to use MLJ, which are deployed
[here](https://JuliaAI.github.io/DataScienceTutorials.jl/)

* [MLJTestIntegration](https://github.com/JuliaAI/MLJTestIntegration.jl)
provides tests for implementations of the MLJ model interface, and
integration tests for the entire MLJ ecosystem
* [MLJTestInterface](https://github.com/JuliaAI/MLJTestInterface.jl) provides tests for
implementations of the MLJ model interface

* [MLJTestIntegration](https://github.com/JuliaAI/MLJTestIntegration.jl) provides tests
for the entire MLJ ecosystem. (Called when you run `ENV["MLJ_TEST_INTEGRATION"]="true";
Pkg.test("MLJ")`.
7 changes: 4 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MLJ"
uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
authors = ["Anthony D. Blaom <[email protected]>"]
version = "0.20.3"
version = "0.20.4"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Expand Down Expand Up @@ -34,7 +34,7 @@ Distributions = "0.21,0.22,0.23, 0.24, 0.25"
MLJBalancing = "0.1"
MLJBase = "1"
MLJEnsembles = "0.4"
MLJFlow = "0.4"
MLJFlow = "0.4.2"
MLJIteration = "0.6"
MLJModels = "0.16"
MLJTestIntegration = "0.5.0"
Expand Down Expand Up @@ -84,8 +84,9 @@ PartitionedLS = "19f41c5e-8610-11e9-2f2a-0d67e7c5027f"
SIRUS = "cdeec39e-fb35-4959-aadb-a1dd5dede958"
SelfOrganizingMaps = "ba4b7379-301a-4be0-bee6-171e4e152787"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
Suppressor = "fd094767-a336-5f1f-9728-57cf17d0bbfb"
SymbolicRegression = "8254be44-1295-4e6a-a16d-46603ac705cb"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["BetaML", "CatBoost", "EvoLinear", "EvoTrees", "Imbalance", "InteractiveUtils", "LightGBM", "MLJClusteringInterface", "MLJDecisionTreeInterface", "MLJFlux", "MLJGLMInterface", "MLJLIBSVMInterface", "MLJLinearModels", "MLJMultivariateStatsInterface", "MLJNaiveBayesInterface", "MLJScikitLearnInterface", "MLJTSVDInterface", "MLJTestInterface", "MLJTestIntegration", "MLJText", "MLJXGBoostInterface", "Markdown", "NearestNeighborModels", "OneRule", "OutlierDetectionNeighbors", "OutlierDetectionPython", "ParallelKMeans", "PartialLeastSquaresRegressor", "PartitionedLS", "SelfOrganizingMaps", "SIRUS", "SymbolicRegression", "StableRNGs", "Test"]
test = ["BetaML", "CatBoost", "EvoLinear", "EvoTrees", "Imbalance", "InteractiveUtils", "LightGBM", "MLJClusteringInterface", "MLJDecisionTreeInterface", "MLJFlux", "MLJGLMInterface", "MLJLIBSVMInterface", "MLJLinearModels", "MLJMultivariateStatsInterface", "MLJNaiveBayesInterface", "MLJScikitLearnInterface", "MLJTSVDInterface", "MLJTestInterface", "MLJTestIntegration", "MLJText", "MLJXGBoostInterface", "Markdown", "NearestNeighborModels", "OneRule", "OutlierDetectionNeighbors", "OutlierDetectionPython", "ParallelKMeans", "PartialLeastSquaresRegressor", "PartitionedLS", "SelfOrganizingMaps", "SIRUS", "SymbolicRegression", "StableRNGs", "Suppressor","Test"]
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,15 @@ framework?** Start [here](https://JuliaAI.github.io/MLJ.jl/dev/quick_start_guide

MLJ was initially created as a Tools, Practices and Systems project at
the [Alan Turing Institute](https://www.turing.ac.uk/)
in 2019. Current funding is provided by a [New Zealand Strategic
in 2019. Funding has also been provided by a [New Zealand Strategic
Science Investment
Fund](https://www.mbie.govt.nz/science-and-technology/science-and-innovation/funding-information-and-opportunities/investment-funds/strategic-science-investment-fund/ssif-funded-programmes/university-of-auckland/)
awarded to the University of Auckland.

MLJ has been developed with the support of the following organizations:

<div align="center">
<img src="material/DFKI.png" width = 100/>
<img src="material/Turing_logo.png" width = 100/>
<img src="material/UoA_logo.png" width = 100/>
<img src="material/IQVIA_logo.png" width = 100/>
Expand Down
24 changes: 11 additions & 13 deletions docs/src/about_mlj.md
100755 → 100644
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# About MLJ

MLJ (Machine Learning in Julia) is a toolbox written in Julia
MLJ (Machine Learning in Julia) is a toolbox written in Julia
providing a common interface and meta-algorithms for selecting,
tuning, evaluating, composing and comparing [over 180 machine learning
models](@ref model_list) written in Julia and other languages. In
Expand All @@ -22,8 +22,7 @@ The first code snippet below creates a new Julia environment
[Installation](@ref) for more on creating a Julia environment for use
with MLJ.

Julia installation instructions are
[here](https://julialang.org/downloads/).
Julia installation instructions are [here](https://julialang.org/downloads/).

```julia
using Pkg
Expand All @@ -44,7 +43,7 @@ Loading and instantiating a gradient tree-boosting model:
using MLJ
Booster = @load EvoTreeRegressor # loads code defining a model type
booster = Booster(max_depth=2) # specify hyper-parameter at construction
booster.nrounds=50 # or mutate afterwards
booster.nrounds = 50 # or mutate afterwards
```

This model is an example of an iterative model. As it stands, the
Expand Down Expand Up @@ -92,7 +91,7 @@ it "self-tuning":
```julia
self_tuning_pipe = TunedModel(model=pipe,
tuning=RandomSearch(),
ranges = max_depth_range,
ranges=max_depth_range,
resampling=CV(nfolds=3, rng=456),
measure=l1,
acceleration=CPUThreads(),
Expand All @@ -105,12 +104,12 @@ Loading a selection of features and labels from the Ames
House Price dataset:

```julia
X, y = @load_reduced_ames;
X, y = @load_reduced_ames
```
Evaluating the "self-tuning" pipeline model's performance using 5-fold
cross-validation (implies multiple layers of nested resampling):

```julia
```julia-repl
julia> evaluate(self_tuning_pipe, X, y,
measures=[l1, l2],
resampling=CV(nfolds=5, rng=123),
Expand Down Expand Up @@ -155,8 +154,7 @@ Extract:

* Consistent interface to handle probabilistic predictions.

* Extensible [tuning
interface](https://github.com/JuliaAI/MLJTuning.jl),
* Extensible [tuning interface](https://github.com/JuliaAI/MLJTuning.jl),
to support a growing number of optimization strategies, and designed
to play well with model composition.

Expand Down Expand Up @@ -229,19 +227,19 @@ installed in a new
[environment](https://julialang.github.io/Pkg.jl/v1/environments/) to
avoid package conflicts. You can do this with

```julia
```julia-repl
julia> using Pkg; Pkg.activate("my_MLJ_env", shared=true)
```

Installing MLJ is also done with the package manager:

```julia
```julia-repl
julia> Pkg.add("MLJ")
```

**Optional:** To test your installation, run

```julia
```julia-repl
julia> Pkg.test("MLJ")
```

Expand All @@ -252,7 +250,7 @@ environment to make model-specific code available. This
happens automatically when you use MLJ's interactive load command
`@iload`, as in

```julia
```julia-repl
julia> Tree = @iload DecisionTreeClassifier # load type
julia> tree = Tree() # instance
```
Expand Down
2 changes: 1 addition & 1 deletion docs/src/adding_models_for_general_use.md
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ suitable for addition to the MLJ Model Registry, consult the [MLJModelInterface.
documentation](https://juliaai.github.io/MLJModelInterface.jl/dev/).

For quick-and-dirty user-defined models see [Simple User Defined
Models](simple_user_defined_models.md).
Models](simple_user_defined_models.md).
Empty file modified docs/src/api.md
100755 → 100644
Empty file.
Loading
Loading