Merge pull request #199 from JuliaAI/dev

Generate new documentation. No new release
JuliaAI · May 9, 2024 · 66b7962 · 66b7962
2 parents c8e2599 + 61a7240
commit 66b7962
Show file tree

Hide file tree

Showing 22 changed files with 316 additions and 304 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 A light-weight interface for developers wanting to integrate
 machine learning models into
-[MLJ](https://github.com/alan-turing-institute/MLJ.jl).
+[MLJ](https://github.com/JuliaAI/MLJ.jl).
 
 
 | Linux | Coverage |
@@ -12,8 +12,8 @@ machine learning models into
 [![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliaai.github.io/MLJModelInterface.jl/dev/)
 
 
-[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) is a framework for evaluating,
+[MLJ](https://JuliaAI.github.io/MLJ.jl/dev/) is a framework for evaluating,
 combining and optimizing machine learning models in Julia. A third party package wanting
 to integrate their machine learning models into MLJ must import the module
 `MLJModelInterface` defined in this package, as described in the
-[documentation](https://juliaai.github.io/MLJModelInterface.jl/dev/).
+[documentation](https://JuliaAI.github.io/MLJModelInterface.jl/dev/).
diff --git a/docs/src/document_strings.md b/docs/src/document_strings.md
@@ -29,23 +29,39 @@ Your document string must include the following components, in order:
   implementation. Generally, defer details on the role of
   hyperparameters to the "Hyperparameters" section (see below).
 
-- Instructions on *how to import the model type* from MLJ (because a user can already inspect the doc-string in the Model Registry, without having loaded the code-providing package).
+- Instructions on *how to import the model type* from MLJ (because a user can
+  already inspect the doc-string in the Model Registry, without having loaded
+  the code-providing package).
 
 - Instructions on *how to instantiate* with default hyperparameters or with keywords.
 
-- A *Training data* section: explains how to bind a model to data in a machine with all possible signatures (eg, `machine(model, X, y)` but also `machine(model, X, y, w)` if, say, weights are supported);  the role and scitype requirements for each data argument should be itemized.
+- A *Training data* section: explains how to bind a model to data in a machine
+  with all possible signatures (eg, `machine(model, X, y)` but also
+  `machine(model, X, y, w)` if, say, weights are supported);  the role and
+  scitype requirements for each data argument should be itemized.
 
 - Instructions on *how to fit* the machine (in the same section).
 
 - A *Hyperparameters* section (unless there aren't any): an itemized list of the parameters, with defaults given.
 
-- An *Operations* section: each implemented operation (`predict`, `predict_mode`, `transform`, `inverse_transform`, etc ) is itemized and explained. This should include operations with no data arguments, such as `training_losses` and `feature_importances`.
+- An *Operations* section: each implemented operation (`predict`,
+  `predict_mode`, `transform`, `inverse_transform`, etc ) is itemized and
+  explained. This should include operations with no data arguments, such as
+  `training_losses` and `feature_importances`.
 
-- A *Fitted parameters* section: To explain what is returned by `fitted_params(mach)` (the same as `MLJModelInterface.fitted_params(model, fitresult)` -  see later) with the fields of that named tuple itemized.
+- A *Fitted parameters* section: To explain what is returned by `fitted_params(mach)`
+  (the same as `MLJModelInterface.fitted_params(model, fitresult)` -  see later)
+  with the fields of that named tuple itemized.
 
-- A *Report* section (if `report` is non-empty): To explain what, if anything, is included in the `report(mach)`  (the same as the `report` return value of `MLJModelInterface.fit`) with the fields itemized.
+- A *Report* section (if `report` is non-empty): To explain what, if anything,
+  is included in the `report(mach)`  (the same as the `report` return value of
+  `MLJModelInterface.fit`) with the fields itemized.
 
-- An optional but highly recommended *Examples* section, which includes MLJ examples, but which could also include others if the model type also implements a second "local" interface, i.e., defined in the same module. (Note that each module referring to a type can declare separate doc-strings which appear concatenated in doc-string queries.)
+- An optional but highly recommended *Examples* section, which includes MLJ
+  examples, but which could also include others if the model type also
+  implements a second "local" interface, i.e., defined in the same module. (Note
+  that each module referring to a type can declare separate doc-strings which
+  appear concatenated in doc-string queries.)
 
 - A closing *"See also"* sentence which includes a `@ref` link to the raw model type (if you are wrapping one).
 

diff --git a/docs/src/implementing_a_data_front_end.md b/docs/src/implementing_a_data_front_end.md
@@ -84,30 +84,34 @@ Suppose a supervised model type `SomeSupervised` supports sample
 weights, leading to two different `fit` signatures, and that it has a
 single operation `predict`:
 
-	fit(model::SomeSupervised, verbosity, X, y)
-	fit(model::SomeSupervised, verbosity, X, y, w)
+```julia
+fit(model::SomeSupervised, verbosity, X, y)
+fit(model::SomeSupervised, verbosity, X, y, w)
 
-	predict(model::SomeSupervised, fitresult, Xnew)
+predict(model::SomeSupervised, fitresult, Xnew)
+```
 
 Without a data front-end implemented, suppose `X` is expected to be a
 table and `y` a vector, but suppose the core algorithm always converts
 `X` to a matrix with features as rows (each record corresponds to
 a column in the table).  Then a new data-front end might look like
 this:
 
-	constant MMI = MLJModelInterface
-
-	# for fit:
-	MMI.reformat(::SomeSupervised, X, y) = (MMI.matrix(X)', y)
-	MMI.reformat(::SomeSupervised, X, y, w) = (MMI.matrix(X)', y, w)
-	MMI.selectrows(::SomeSupervised, I, Xmatrix, y) =
-		(view(Xmatrix, :, I), view(y, I))
-	MMI.selectrows(::SomeSupervised, I, Xmatrix, y, w) =
-		(view(Xmatrix, :, I), view(y, I), view(w, I))
-
-	# for predict:
-	MMI.reformat(::SomeSupervised, X) = (MMI.matrix(X)',)
-	MMI.selectrows(::SomeSupervised, I, Xmatrix) = (view(Xmatrix, :, I),)
+```julia
+constant MMI = MLJModelInterface
+
+# for fit:
+MMI.reformat(::SomeSupervised, X, y) = (MMI.matrix(X)', y)
+MMI.reformat(::SomeSupervised, X, y, w) = (MMI.matrix(X)', y, w)
+MMI.selectrows(::SomeSupervised, I, Xmatrix, y) =
+        (view(Xmatrix, :, I), view(y, I))
+MMI.selectrows(::SomeSupervised, I, Xmatrix, y, w) =
+        (view(Xmatrix, :, I), view(y, I), view(w, I))
+
+# for predict:
+MMI.reformat(::SomeSupervised, X) = (MMI.matrix(X)',)
+MMI.selectrows(::SomeSupervised, I, Xmatrix) = (view(Xmatrix, :, I),)
+```
 
 With these additions, `fit` and `predict` are refactored, so that `X`
 and `Xnew` represent matrices with features as rows.
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,7 +1,7 @@
 # Adding Models for General Use
 
 The machine learning tools provided by
-[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) can be applied to the models in
+[MLJ](https://JuliaAI.github.io/MLJ.jl/dev/) can be applied to the models in
 any package that imports 
 [MLJModelInterface](https://github.com/JuliaAI/MLJModelInterface.jl) and implements the
 API defined there, as outlined in this document. 
@@ -15,7 +15,7 @@ or by a stand-alone "interface-only" package, using the template
 [MLJExampleInterface.jl](https://github.com/JuliaAI/MLJExampleInterface.jl) (see [Where to
 place code implementing new models](@ref) below). For a list of packages implementing the
 MLJ model API (natively, and in interface packages) see
-[here](https://alan-turing-institute.github.io/MLJ.jl/dev/list_of_supported_models/).
+[here](https://JuliaAI.github.io/MLJ.jl/dev/list_of_supported_models/).
 
 ## Important
 
@@ -31,7 +31,7 @@ project's [extras] and [targets]. In testing, simply use `MLJBase` in
 place of `MLJModelInterface`.
 
 It is assumed the reader has read the [Getting
-Started](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/) section of
+Started](https://JuliaAI.github.io/MLJ.jl/dev/getting_started/) section of
 the MLJ manual.  To implement the API described here, some familiarity with the following
 packages is also helpful:
 
@@ -52,5 +52,5 @@ packages is also helpful:
 In MLJ, the basic interface exposed to the user, built atop the model interface described
 here, is the *machine interface*. After a first reading of this document, the reader may
 wish to refer to [MLJ
-Internals](https://alan-turing-institute.github.io/MLJ.jl/dev/internals/) for context.
+Internals](https://JuliaAI.github.io/MLJ.jl/dev/internals/) for context.
 
diff --git a/docs/src/iterative_models.md b/docs/src/iterative_models.md
@@ -18,11 +18,11 @@ If an MLJ `Machine` is being `fit!` and it is not the first time, then `update`
 instead of `fit`, unless the machine `fit!` has been called with a new `rows` keyword
 argument. However, `MLJModelInterface` defines a fallback for `update` which just calls
 `fit`. For context, see the
-[Internals](https://alan-turing-institute.github.io/MLJ.jl/dev/internals/) section of the
+[Internals](https://JuliaAI.github.io/MLJ.jl/dev/internals/) section of the
 MLJ manual.
 
 Learning networks wrapped as models constitute one use case (see the [Composing
-Models](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/) section of
+Models](https://JuliaAI.github.io/MLJ.jl/dev/composing_models/) section of
 the MLJ manual): one would like each component model to be retrained only when
 hyperparameter changes "upstream" make this necessary. In this case, MLJ provides a
 fallback (specifically, the fallback is for any subtype of `SupervisedNetwork =

diff --git a/docs/src/quick_start_guide.md b/docs/src/quick_start_guide.md
@@ -18,11 +18,11 @@ understanding of how things work with MLJ.  In particular, you are familiar with
 - [CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl), if working
   with finite discrete data, e.g., doing classification; see also the [Working with
   Categorical
-  Data](https://alan-turing-institute.github.io/MLJ.jl/dev/working_with_categorical_data/)
+  Data](https://JuliaAI.github.io/MLJ.jl/dev/working_with_categorical_data/)
   section of the MLJ manual.
 
 If you're not familiar with any one of these points, the [Getting
-Started](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/) section of
+Started](https://JuliaAI.github.io/MLJ.jl/dev/getting_started/) section of
 the MLJ manual may help.
 
 *But tables don't make sense for my model!* If a case can be made that
@@ -99,8 +99,7 @@ Further to the last point, `a::Float64 = 0.5::(_ > 0)` indicates that
 the field `a` is a `Float64`, takes `0.5` as its default value, and
 expects its value to be positive.
 
-Please see [this
-issue](https://github.com/JuliaAI/MLJBase.jl/issues/68)
+Please see [this issue](https://github.com/JuliaAI/MLJBase.jl/issues/68)
 for a known issue and workaround relating to the use of `@mlj_model`
 with negative defaults.
 
@@ -201,7 +200,7 @@ For a classifier, the steps are fairly similar to a regressor with these differe
 1. `y` will be a categorical vector and you will typically want to use
    the integer encoding of `y` instead of `CategoricalValue`s; use
    `MLJModelInterface.int` for this.
-1.  You will need to pass the full pool of target labels (not just
+2.  You will need to pass the full pool of target labels (not just
    those observed in the training data) and additionally, in the
    `Deterministic` case, the encoding, to make these available to
    `predict`. A simple way to do this is to pass `y[1]` in the
@@ -210,19 +209,19 @@ For a classifier, the steps are fairly similar to a regressor with these differe
    method for recovering categorical elements from their integer
    representations (e.g., `d(2)` is the categorical element with `2`
    as encoding).
-2. In the case of a *probabilistic* classifier you should pass all
+3. In the case of a *probabilistic* classifier you should pass all
    probabilities simultaneously to the [`UnivariateFinite`](@ref) constructor
    to get an abstract `UnivariateFinite` vector (type
    `UnivariateFiniteArray`) rather than use comprehension or
    broadcasting to get a vanilla vector. This is for performance
    reasons.
-   
+
 If implementing a classifier, you should probably consult the more
 detailed instructions at [The predict method](@ref).
 
 **Examples**:
 
--  GLM's [BinaryClassifier](https://github.com/JuliaAI/MLJModels.jl/blob/3687491b132be8493b6f7a322aedf66008caaab1/src/GLM.jl#L119-L131) (`Probabilistic`)
+- GLM's [BinaryClassifier](https://github.com/JuliaAI/MLJModels.jl/blob/3687491b132be8493b6f7a322aedf66008caaab1/src/GLM.jl#L119-L131) (`Probabilistic`)
 
 - LIBSVM's [SVC](https://github.com/JuliaAI/MLJModels.jl/blob/master/src/LIBSVM.jl) (`Deterministic`)
 
@@ -273,7 +272,7 @@ implementation creates:
   affect the outcome of training. It is okay to add "control"
   parameters (such as specifying an `acceleration` parameter specifying
   computational resources, as
-  [here](https://github.com/alan-turing-institute/MLJ.jl/blob/master/src/ensembles.jl#L193)).
+  [here](https://github.com/JuliaAI/MLJ.jl/blob/master/src/ensembles.jl#L193)).
 - Use `report` to return *everything else*, including model-specific
   *methods* (or other callable objects). This includes feature rankings,
   decision boundaries, SVM support vectors, clustering centres,
@@ -349,8 +348,8 @@ MLJModelInterface.metadata_model(YourModel1,
     output_scitype  = MLJModelInterface.Table(MLJModelInterface.Continuous),  # for an unsupervised, what output?
     supports_weights = false,                                                  # does the model support sample weights?
     descr   = "A short description of your model"
-	load_path    = "YourPackage.SubModuleContainingModelStructDefinition.YourModel1"
-    )
+    load_path    = "YourPackage.SubModuleContainingModelStructDefinition.YourModel1"
+)
 ```
 
 *Important.* Do not omit the `load_path` specification. Without a

diff --git a/docs/src/serialization.md b/docs/src/serialization.md
@@ -10,7 +10,7 @@ implemented in languages other than Julia.
 
 The MLJ user can serialize and deserialize machines, as she would any other julia
 object. (This user has the option of first removing data from the machine. See the [Saving
-machines](https://alan-turing-institute.github.io/MLJ.jl/dev/machines/#Saving-machines)
+machines](https://JuliaAI.github.io/MLJ.jl/dev/machines/#Saving-machines)
 section of the MLJ manual for details.) However, a problem can occur if a model's
 `fitresult` (see [The fit method](@ref)) is not a persistent object. For example, it might
 be a C pointer that would have no meaning in a new Julia session.

diff --git a/docs/src/static_models.md b/docs/src/static_models.md
@@ -2,7 +2,7 @@
 
 A model type subtypes `Static <: Unsupervised` if it does not generalize to new data but
 nevertheless has hyperparameters. See the [Static
-transformers](https://alan-turing-institute.github.io/MLJ.jl/dev/transformers/#Static-transformers)
+transformers](https://JuliaAI.github.io/MLJ.jl/dev/transformers/#Static-transformers)
 section of the MLJ manual for examples. In the `Static` case, `transform` can have
 multiple arguments and `input_scitype` refers to the allowed scitype of the slurped data,
 *even if there is only a single argument.* For example, if the signature is

diff --git a/docs/src/summary_of_methods.md b/docs/src/summary_of_methods.md
@@ -43,11 +43,11 @@ Optional, if `SomeSupervisedModel <: Probabilistic`:
 
 ```julia
 MMI.predict_mode(model::SomeSupervisedModel, fitresult, Xnew) =
-	mode.(predict(model, fitresult, Xnew))
+    mode.(predict(model, fitresult, Xnew))
 MMI.predict_mean(model::SomeSupervisedModel, fitresult, Xnew) =
-	mean.(predict(model, fitresult, Xnew))
+    mean.(predict(model, fitresult, Xnew))
 MMI.predict_median(model::SomeSupervisedModel, fitresult, Xnew) =
-	median.(predict(model, fitresult, Xnew))
+    median.(predict(model, fitresult, Xnew))
 ```
 
 Required, if the model is to be registered (findable by general users):

diff --git a/docs/src/supervised_models.md b/docs/src/supervised_models.md
@@ -19,15 +19,15 @@ The following sections were written with `Supervised` models in mind, but also c
 material relevant to general models:
 
 - [Summary of methods](@ref)
-- [The form of data for fitting and predicting](@ref) 
+- [The form of data for fitting and predicting](@ref)
 - [The fit method](@ref)
 - [The fitted_params method](@ref)
-- [The predict method](@ref) 
-- [The predict_joint method](@ref) 
-- [Training losses](@ref) 
-- [Feature importances](@ref) 
-- [Trait declarations](@ref) 
-- [Iterative models and the update! method](@ref) 
-- [Implementing a data front end](@ref) 
-- [Supervised models with a transform method](@ref) 
+- [The predict method](@ref)
+- [The predict_joint method](@ref)
+- [Training losses](@ref)
+- [Feature importances](@ref)
+- [Trait declarations](@ref)
+- [Iterative models and the update! method](@ref)
+- [Implementing a data front end](@ref)
+- [Supervised models with a transform method](@ref)
 - [Models that learn a probability distribution](@ref)
diff --git a/docs/src/the_fit_method.md b/docs/src/the_fit_method.md
@@ -7,21 +7,21 @@ MMI.fit(model::SomeSupervisedModel, verbosity, X, y) -> fitresult, cache, report
 ```
 
 1. `fitresult` is the fitresult in the sense above (which becomes an
-	argument for `predict` discussed below).
+    argument for `predict` discussed below).
 
 2.  `report` is a (possibly empty) `NamedTuple`, for example,
-	`report=(deviance=..., dof_residual=..., stderror=..., vcov=...)`.
-	Any training-related statistics, such as internal estimates of the
-	generalization error, and feature rankings, should be returned in
-	the `report` tuple. How, or if, these are generated should be
-	controlled by hyperparameters (the fields of `model`). Fitted
-	parameters, such as the coefficients of a linear model, do not go
-	in the report as they will be extractable from `fitresult` (and
-	accessible to MLJ through the `fitted_params` method described below).
-
-3.	The value of `cache` can be `nothing`, unless one is also defining
-	an `update` method (see below). The Julia type of `cache` is not
-	presently restricted.
+    `report=(deviance=..., dof_residual=..., stderror=..., vcov=...)`.
+    Any training-related statistics, such as internal estimates of the
+    generalization error, and feature rankings, should be returned in
+    the `report` tuple. How, or if, these are generated should be
+    controlled by hyperparameters (the fields of `model`). Fitted
+    parameters, such as the coefficients of a linear model, do not go
+    in the report as they will be extractable from `fitresult` (and
+    accessible to MLJ through the `fitted_params` method described below).
+
+3.  The value of `cache` can be `nothing`, unless one is also defining
+    an `update` method (see below). The Julia type of `cache` is not
+    presently restricted.
 
 !!! note
 

diff --git a/docs/src/the_predict_method.md b/docs/src/the_predict_method.md
@@ -6,8 +6,7 @@ A compulsory `predict` method has the form
 MMI.predict(model::SomeSupervisedModel, fitresult, Xnew) -> yhat
 ```
 
-Here `Xnew` will have the same form as the `X` passed to
-`fit`.
+Here `Xnew` will have the same form as the `X` passed to `fit`.
 
 Note that while `Xnew` generally consists of multiple observations
 (e.g., has multiple rows in the case of a table) it is assumed, in view of
@@ -44,26 +43,26 @@ may look something like this:
 
 ```julia
 function MMI.fit(model::SomeSupervisedModel, verbosity, X, y)
-	yint = MMI.int(y)
-	a_target_element = y[1]                # a CategoricalValue/String
-	decode = MMI.decoder(a_target_element) # can be called on integers
+    yint = MMI.int(y)
+    a_target_element = y[1]                # a CategoricalValue/String
+    decode = MMI.decoder(a_target_element) # can be called on integers
 
-	core_fitresult = SomePackage.fit(X, yint, verbosity=verbosity)
+    core_fitresult = SomePackage.fit(X, yint, verbosity=verbosity)
 
-	fitresult = (decode, core_fitresult)
-	cache = nothing
-	report = nothing
-	return fitresult, cache, report
+    fitresult = (decode, core_fitresult)
+    cache = nothing
+    report = nothing
+    return fitresult, cache, report
 end
 ```
 
 while a corresponding deterministic `predict` operation might look like this:
 
 ```julia
 function MMI.predict(model::SomeSupervisedModel, fitresult, Xnew)
-	decode, core_fitresult = fitresult
-	yhat = SomePackage.predict(core_fitresult, Xnew)
-	return decode.(yhat)
+    decode, core_fitresult = fitresult
+    yhat = SomePackage.predict(core_fitresult, Xnew)
+    return decode.(yhat)
 end
 ```
 
@@ -155,8 +154,8 @@ yhat = MLJModelInterface.UnivariateFinite([:FALSE, :TRUE], probs, augment=true,
 ```
 
 The constructor has a lot of options, including passing a dictionary
-instead of vectors. See
-`CategoricalDistributions.UnivariateFinite`](@ref) for details.
+instead of vectors. See [`CategoricalDistributions.UnivariateFinite`](@ref)
+for details.
 
 See
 [LinearBinaryClassifier](https://github.com/JuliaAI/MLJModels.jl/blob/master/src/GLM.jl)