JuliaAI · ablaom · May 20, 2024 · Apr 22, 2024 · Apr 22, 2024 · Apr 24, 2024
diff --git a/ORGANIZATION.md b/ORGANIZATION.md
@@ -8,102 +8,85 @@ connections do not currently exist but are planned/proposed.*
 Repositories of some possible interest outside of MLJ, or beyond
 its conventional use, are marked with a ⟂ symbol:
 
-* [MLJ.jl](https://github.com/JuliaAI/MLJ.jl) is the
-  general user's point-of-entry for choosing, loading, composing,
-  evaluating and tuning machine learning models. It pulls in most code
-  from other repositories described below.  MLJ also hosts the [MLJ
-  manual](src/docs) which documents functionality across the
-  repositories, with the exception of ScientificTypesBase, and
-  MLJScientific types which host their own documentation. (The MLJ
-  manual and MLJTutorials do provide overviews of scientific types.)
-
-* [MLJModelInterface.jl](https://github.com/JuliaAI/MLJModelInterface.jl)
-  is a lightweight package imported by packages implementing MLJ's
-  interface for their machine learning models. It's only dependencies
-  are ScientificTypesBase.jl (which depends only on the standard
-  library module `Random`) and
-  [StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl)
-  (which depends only on ScientificTypesBase.jl).
+* [MLJ.jl](https://github.com/JuliaAI/MLJ.jl) is the general user's point-of-entry for
+  choosing, loading, composing, evaluating and tuning machine learning models. It pulls in
+  most code from other repositories described below.  MLJ also hosts the [MLJ
+  manual](src/docs) which documents functionality across the repositories, although some
+  pages point to documentation hosted locally by a particular package.
+
+
+* [MLJModelInterface.jl](https://github.com/JuliaAI/MLJModelInterface.jl) is a lightweight
+  package imported by packages implementing MLJ's interface for their machine learning
+  models. It's only dependencies are ScientificTypesBase.jl (which depends only on the
+  standard library module `Random`) and
+  [StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl) (which depends
+  only on ScientificTypesBase.jl).
 
-* (⟂)
-  [MLJBase.jl](https://github.com/JuliaAI/MLJBase.jl) is
-  a large repository with two main purposes: (i) to give "dummy"
-  methods defined in MLJModelInterface their intended functionality
-  (which depends on third party packages, such as
+* (⟂) [MLJBase.jl](https://github.com/JuliaAI/MLJBase.jl) is a large repository with two
+  main purposes: (i) to give "dummy" methods defined in MLJModelInterface their intended
+  functionality (which depends on third party packages, such as
   [Tables.jl](https://github.com/JuliaData/Tables.jl),
-  [Distributions.jl](https://github.com/JuliaStats/Distributions.jl)
-  and
-  [CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl));
-  and (ii) provide functionality essential to the MLJ user that has
-  not been relegated to its own "satellite" repository for some
-  reason. See the [MLJBase.jl
-  readme](https://github.com/JuliaAI/MLJBase.jl) for a
-  detailed description of MLJBase's contents.
+  [Distributions.jl](https://github.com/JuliaStats/Distributions.jl) and
+  [CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl)); and (ii)
+  provide functionality essential to the MLJ user that has not been relegated to its own
+  "satellite" repository for some reason. See the [MLJBase.jl
+  readme](https://github.com/JuliaAI/MLJBase.jl) for a detailed description of MLJBase's
+  contents.
 
-* [StatisticalMeasures.jl](https://github.com/JuliaAI/StatisticalMeasures.jl) provifes
+* [StatisticalMeasures.jl](https://github.com/JuliaAI/StatisticalMeasures.jl) provides
   performance measures (metrics) such as losses and scores.
 
-* [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl)
-  hosts the *MLJ model registry*, which contains metadata on all the
-  models the MLJ user can search and load from MLJ. Moreover, it
-  provides the functionality for **loading model code** from MLJ on
-  demand. Finally, it furnishes some commonly used transformers for
-  data pre-processing, such as `ContinuousEncoder` and `Standardizer`.
+* [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl) hosts the *MLJ model registry*,
+  which contains metadata on all the models the MLJ user can search and load from
+  MLJ. Moreover, it provides the functionality for **loading model code** from MLJ on
+  demand. Finally, it furnishes some commonly used transformers for data pre-processing,
+  such as `ContinuousEncoder` and `Standardizer`.
 
-* [MLJTuning.jl](https://github.com/JuliaAI/MLJTuning.jl)
-  provides MLJ's `TunedModel` wrapper for hyper-parameter
-  optimization, including the extendable API for tuning strategies,
-  and selected in-house implementations, such as `Grid` and
-  `RandomSearch`.
+* [MLJTuning.jl](https://github.com/JuliaAI/MLJTuning.jl) provides MLJ's `TunedModel`
+  wrapper for hyper-parameter optimization, including the extendable API for tuning
+  strategies, and selected in-house implementations, such as `Grid` and `RandomSearch`.
 
-* [MLJEnsembles.jl](https://github.com/JuliaAI/MLJEnsembles.jl)
-  provides MLJ's `EnsembleModel` wrapper, for creating homogenous
-  model ensembles.
+* [MLJEnsembles.jl](https://github.com/JuliaAI/MLJEnsembles.jl) provides MLJ's
+  `EnsembleModel` wrapper, for creating homogeneous model ensembles.
 
-* [MLJIteration.jl](https://github.com/JuliaAI/MLJIteration.jl)
-  provides the `IteratedModel` wrapper for controlling iterative
-  models (snapshots, early stopping criteria, etc)
+* [MLJIteration.jl](https://github.com/JuliaAI/MLJIteration.jl) provides the
+  `IteratedModel` wrapper for controlling iterative models (snapshots, early stopping
+  criteria, etc)
 
-* (⟂)
-  [OpenML.jl](https://github.com/JuliaAI/OpenML.jl) provides
-  integration with the [OpenML](https://www.openml.org) data science
-  exchange platform
+* [MLJFlow.jl](https://github.com/JuliaAI/MLJFlow.jl) provides integration with the
+  platform-agnostic machine learning tracking tool [MLflow](https://mlflow.org).
 
-* (⟂)
-  [MLJLinearModels.jl](https://github.com/JuliaAI/MLJLinearModels.jl)
-  is an experimental package for a wide range of julia-native penalized linear models
-  such as Lasso, Elastic-Net, Robust regression, LAD regression,
-  etc. 
+* (⟂) [OpenML.jl](https://github.com/JuliaAI/OpenML.jl) provides integration with the
+  [OpenML](https://www.openml.org) data science exchange platform
+
+* (⟂) [MLJLinearModels.jl](https://github.com/JuliaAI/MLJLinearModels.jl) provides a wide
+  range of julia-native penalized linear models such as Lasso, Elastic-Net, Robust
+  regression, LAD regression, etc.
 
-* [MLJFlux.jl](https://github.com/FluxML/MLJFlux.jl) an experimental
-  package for gradient-descent models, such as traditional
-  neural-networks, built with
+* [MLJFlux.jl](https://github.com/FluxML/MLJFlux.jl) an experimental package for
+  gradient-descent models, such as traditional neural-networks, built with
   [Flux.jl](https://github.com/FluxML/Flux.jl), in MLJ.
 
-* (⟂)
-  [ScientificTypesBase.jl](https://github.com/JuliaAI/ScientificTypesBase.jl)
-  is an ultra lightweight package providing "scientific" types,
-  such as `Continuous`, `OrderedFactor`, `Image` and `Table`. It's
-  purpose is to formalize conventions around the scientific
-  interpretation of ordinary machine types, such as `Float32` and
+* (⟂) [ScientificTypesBase.jl](https://github.com/JuliaAI/ScientificTypesBase.jl) is an
+  ultra lightweight package providing "scientific" types, such as `Continuous`,
+  `OrderedFactor`, `Image` and `Table`. It's purpose is to formalize conventions around
+  the scientific interpretation of ordinary machine types, such as `Float32` and
   `DataFrame`.
 
-* (⟂)
-  [ScientificTypes.jl](https://github.com/JuliaAI/ScientificTypes.jl)
-  articulates the particular convention for the scientific interpretation of
-  data that MLJ adopts
+* (⟂) [ScientificTypes.jl](https://github.com/JuliaAI/ScientificTypes.jl) articulates the
+  particular convention for the scientific interpretation of data that MLJ adopts
 
-* (⟂)
-  [StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl)
-  An ultra lightweight package defining fall-back implementations for
-  a collection of traits possessed by statistical objects, principally
-  models and measures (metrics).
+* (⟂) [StatisticalTraits.jl](https://github.com/JuliaAI/StatisticalTraits.jl) An ultra
+  lightweight package defining fall-back implementations for a collection of traits
+  possessed by statistical objects, principally models and measures (metrics).
 
-* (⟂)
-  [DataScienceTutorials](https://github.com/JuliaAI/DataScienceTutorials.jl)
-  collects tutorials on how to use MLJ, which are deployed
+* (⟂) [DataScienceTutorials](https://github.com/JuliaAI/DataScienceTutorials.jl) collects
+  tutorials on how to use MLJ, which are deployed
   [here](https://JuliaAI.github.io/DataScienceTutorials.jl/)
 
-* [MLJTestIntegration](https://github.com/JuliaAI/MLJTestIntegration.jl)
-  provides tests for implementations of the MLJ model interface, and
-  integration tests for the entire MLJ ecosystem
+* [MLJTestInterface](https://github.com/JuliaAI/MLJTestInterface.jl) provides tests for
+  implementations of the MLJ model interface
+
+* [MLJTestIntegration](https://github.com/JuliaAI/MLJTestIntegration.jl) provides tests
+  for the entire MLJ ecosystem. (Called when you run `ENV["MLJ_TEST_INTEGRATION"]="true";
+  Pkg.test("MLJ")`.
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "MLJ"
 uuid = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
 authors = ["Anthony D. Blaom <[email protected]>"]
-version = "0.20.3"
+version = "0.20.4"
 
 [deps]
 CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
@@ -34,7 +34,7 @@ Distributions = "0.21,0.22,0.23, 0.24, 0.25"
 MLJBalancing = "0.1"
 MLJBase = "1"
 MLJEnsembles = "0.4"
-MLJFlow = "0.4"
+MLJFlow = "0.4.2"
 MLJIteration = "0.6"
 MLJModels = "0.16"
 MLJTestIntegration = "0.5.0"
@@ -84,8 +84,9 @@ PartitionedLS = "19f41c5e-8610-11e9-2f2a-0d67e7c5027f"
 SIRUS = "cdeec39e-fb35-4959-aadb-a1dd5dede958"
 SelfOrganizingMaps = "ba4b7379-301a-4be0-bee6-171e4e152787"
 StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
+Suppressor = "fd094767-a336-5f1f-9728-57cf17d0bbfb"
 SymbolicRegression = "8254be44-1295-4e6a-a16d-46603ac705cb"
 Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
 
 [targets]
-test = ["BetaML", "CatBoost", "EvoLinear", "EvoTrees", "Imbalance", "InteractiveUtils", "LightGBM", "MLJClusteringInterface", "MLJDecisionTreeInterface", "MLJFlux", "MLJGLMInterface", "MLJLIBSVMInterface", "MLJLinearModels", "MLJMultivariateStatsInterface", "MLJNaiveBayesInterface", "MLJScikitLearnInterface", "MLJTSVDInterface", "MLJTestInterface", "MLJTestIntegration", "MLJText", "MLJXGBoostInterface", "Markdown", "NearestNeighborModels", "OneRule", "OutlierDetectionNeighbors", "OutlierDetectionPython", "ParallelKMeans", "PartialLeastSquaresRegressor", "PartitionedLS", "SelfOrganizingMaps", "SIRUS", "SymbolicRegression", "StableRNGs", "Test"]
+test = ["BetaML", "CatBoost", "EvoLinear", "EvoTrees", "Imbalance", "InteractiveUtils", "LightGBM", "MLJClusteringInterface", "MLJDecisionTreeInterface", "MLJFlux", "MLJGLMInterface", "MLJLIBSVMInterface", "MLJLinearModels", "MLJMultivariateStatsInterface", "MLJNaiveBayesInterface", "MLJScikitLearnInterface", "MLJTSVDInterface", "MLJTestInterface", "MLJTestIntegration", "MLJText", "MLJXGBoostInterface", "Markdown", "NearestNeighborModels", "OneRule", "OutlierDetectionNeighbors", "OutlierDetectionPython", "ParallelKMeans", "PartialLeastSquaresRegressor", "PartitionedLS", "SelfOrganizingMaps", "SIRUS", "SymbolicRegression", "StableRNGs", "Suppressor","Test"]
diff --git a/README.md b/README.md
@@ -42,14 +42,15 @@ framework?** Start [here](https://JuliaAI.github.io/MLJ.jl/dev/quick_start_guide
 
 MLJ was initially created as a Tools, Practices and Systems project at
 the [Alan Turing Institute](https://www.turing.ac.uk/)
-in 2019. Current funding is provided by a [New Zealand Strategic
+in 2019. Funding has also been provided by a [New Zealand Strategic
 Science Investment
 Fund](https://www.mbie.govt.nz/science-and-technology/science-and-innovation/funding-information-and-opportunities/investment-funds/strategic-science-investment-fund/ssif-funded-programmes/university-of-auckland/)
 awarded to the University of Auckland.
 
 MLJ has been developed with the support of the following organizations:
 
 <div align="center">
+    <img src="material/DFKI.png" width = 100/>
     <img src="material/Turing_logo.png" width = 100/>
     <img src="material/UoA_logo.png" width = 100/>
     <img src="material/IQVIA_logo.png" width = 100/>

diff --git a/docs/src/about_mlj.md b/docs/src/about_mlj.md
@@ -1,6 +1,6 @@
 # About MLJ
 
-MLJ (Machine Learning in Julia) is a toolbox written in Julia 
+MLJ (Machine Learning in Julia) is a toolbox written in Julia
 providing a common interface and meta-algorithms for selecting,
 tuning, evaluating, composing and comparing [over 180 machine learning
 models](@ref model_list) written in Julia and other languages. In
@@ -22,8 +22,7 @@ The first code snippet below creates a new Julia environment
 [Installation](@ref) for more on creating a Julia environment for use
 with MLJ.
 
-Julia installation instructions are
-[here](https://julialang.org/downloads/).
+Julia installation instructions are [here](https://julialang.org/downloads/).
 
 ```julia
 using Pkg
@@ -44,7 +43,7 @@ Loading and instantiating a gradient tree-boosting model:
 using MLJ
 Booster = @load EvoTreeRegressor # loads code defining a model type
 booster = Booster(max_depth=2)   # specify hyper-parameter at construction
-booster.nrounds=50               # or mutate afterwards
+booster.nrounds = 50             # or mutate afterwards
 ```
 
 This model is an example of an iterative model. As it stands, the
@@ -92,7 +91,7 @@ it "self-tuning":
 ```julia
 self_tuning_pipe = TunedModel(model=pipe,
                               tuning=RandomSearch(),
-                              ranges = max_depth_range,
+                              ranges=max_depth_range,
                               resampling=CV(nfolds=3, rng=456),
                               measure=l1,
                               acceleration=CPUThreads(),
@@ -105,12 +104,12 @@ Loading a selection of features and labels from the Ames
 House Price dataset:
 
 ```julia
-X, y = @load_reduced_ames;
+X, y = @load_reduced_ames
 ```
 Evaluating the "self-tuning" pipeline model's performance using 5-fold
 cross-validation (implies multiple layers of nested resampling):
 
-```julia
+```julia-repl
 julia> evaluate(self_tuning_pipe, X, y,
                 measures=[l1, l2],
                 resampling=CV(nfolds=5, rng=123),
@@ -155,8 +154,7 @@ Extract:
 
 * Consistent interface to handle probabilistic predictions.
 
-* Extensible [tuning
-  interface](https://github.com/JuliaAI/MLJTuning.jl),
+* Extensible [tuning interface](https://github.com/JuliaAI/MLJTuning.jl),
   to support a growing number of optimization strategies, and designed
   to play well with model composition.
 
@@ -229,19 +227,19 @@ installed in a new
 [environment](https://julialang.github.io/Pkg.jl/v1/environments/) to
 avoid package conflicts. You can do this with
 
-```julia
+```julia-repl
 julia> using Pkg; Pkg.activate("my_MLJ_env", shared=true)
 ```
 
 Installing MLJ is also done with the package manager:
 
-```julia
+```julia-repl
 julia> Pkg.add("MLJ")
 ```
 
 **Optional:** To test your installation, run
 
-```julia
+```julia-repl
 julia> Pkg.test("MLJ")
 ```
 
@@ -252,7 +250,7 @@ environment to make model-specific code available. This
 happens automatically when you use MLJ's interactive load command
 `@iload`, as in
 
-```julia
+```julia-repl
 julia> Tree = @iload DecisionTreeClassifier # load type
 julia> tree = Tree() # instance
 ```

diff --git a/docs/src/adding_models_for_general_use.md b/docs/src/adding_models_for_general_use.md
@@ -5,4 +5,4 @@ suitable for addition to the MLJ Model Registry, consult the [MLJModelInterface.
 documentation](https://juliaai.github.io/MLJModelInterface.jl/dev/).
 
 For quick-and-dirty user-defined models see [Simple User Defined
-Models](simple_user_defined_models.md). 
+Models](simple_user_defined_models.md).
diff --git a/docs/src/api.md b/docs/src/api.md