Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates now that MLJ.jl has been moved to the JuliaAI GitHub organization #1113

Merged
merged 1 commit into from
May 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
347 changes: 237 additions & 110 deletions examples/lightning_tour/lightning_tour.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion examples/lightning_tour/lightning_tour.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# # Lightning tour of MLJ

# *For a more elementary introduction to MLJ, see [Getting
# Started](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/).*
# Started](https://juliaai.github.io/MLJ.jl/dev/getting_started/).*

# **Note.** Be sure this file has not been separated from the
# accompanying Project.toml and Manifest.toml files, which should not
Expand Down
24 changes: 12 additions & 12 deletions examples/telco/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"metadata": {},
"source": [
"An application of the [MLJ\n",
"toolbox](https://alan-turing-institute.github.io/MLJ.jl/dev/) to the\n",
"toolbox](https://juliaai.github.io/MLJ.jl/dev/) to the\n",
"Telco Customer Churn dataset, aimed at practicing data scientists\n",
"new to MLJ (Machine Learning in Julia). This tutorial does not\n",
"cover exploratory data analysis."
Expand All @@ -31,9 +31,9 @@
"metadata": {},
"source": [
"For other MLJ learning resources see the [Learning\n",
"MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_mlj/)\n",
"MLJ](https://juliaai.github.io/MLJ.jl/dev/learning_mlj/)\n",
"section of the\n",
"[manual](https://alan-turing-institute.github.io/MLJ.jl/dev/)."
"[manual](https://juliaai.github.io/MLJ.jl/dev/)."
]
},
{
Expand Down Expand Up @@ -132,7 +132,7 @@
"the notebook, package instantiation and pre-compilation may take a\n",
"minute or so to complete. **This step will fail** if the [correct\n",
"Manifest.toml and Project.toml\n",
"files](https://github.com/alan-turing-institute/MLJ.jl/tree/dev/examples/telco)\n",
"files](https://github.com/JuliaAI/MLJ.jl/tree/dev/examples/telco)\n",
"are not in the same directory as this notebook."
]
},
Expand Down Expand Up @@ -203,7 +203,7 @@
"metadata": {},
"source": [
"This section is a condensed adaption of the [Getting Started\n",
"example](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)\n",
"example](https://juliaai.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)\n",
"in the MLJ documentation."
]
},
Expand Down Expand Up @@ -448,7 +448,7 @@
"metadata": {},
"source": [
"A machine stores some other information enabling [warm\n",
"restart](https://alan-turing-institute.github.io/MLJ.jl/dev/machines/#Warm-restarts)\n",
"restart](https://juliaai.github.io/MLJ.jl/dev/machines/#Warm-restarts)\n",
"for some models, but we won't go into that here. You are allowed to\n",
"access and mutate the `model` parameter:"
]
Expand Down Expand Up @@ -1140,7 +1140,7 @@
"metadata": {},
"source": [
"For tools helping us to identify suitable models, see the [Model\n",
"Search](https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/#model_search)\n",
"Search](https://juliaai.github.io/MLJ.jl/dev/model_search/#model_search)\n",
"section of the manual. We will build a gradient tree-boosting model,\n",
"a popular first choice for structured data like we have here. Model\n",
"code is contained in a third-party package called\n",
Expand Down Expand Up @@ -1379,7 +1379,7 @@
"source": [
"Note that the component models appear as hyper-parameters of\n",
"`pipe`. Pipelines are an implementation of a more general [model\n",
"composition](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/#Composing-Models)\n",
"composition](https://juliaai.github.io/MLJ.jl/dev/composing_models/#Composing-Models)\n",
"interface provided by MLJ that advanced users may want to learn about."
]
},
Expand Down Expand Up @@ -2152,7 +2152,7 @@
"metadata": {},
"source": [
"We choose a `StratifiedCV` resampling strategy; the complete list of options is\n",
"[here](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies)."
"[here](https://juliaai.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies)."
]
},
{
Expand Down Expand Up @@ -2393,7 +2393,7 @@
"metadata": {},
"source": [
"First, we select appropriate controls from [this\n",
"list](https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):"
"list](https://juliaai.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):"
]
},
{
Expand Down Expand Up @@ -2559,7 +2559,7 @@
"wanting to visualize the effect of changes to a *single*\n",
"hyper-parameter (which could be an iteration parameter). See, for\n",
"example, [this section of the\n",
"manual](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_curves/)\n",
"manual](https://juliaai.github.io/MLJ.jl/dev/learning_curves/)\n",
"or [this\n",
"tutorial](https://github.com/ablaom/MLJTutorial.jl/blob/dev/notebooks/04_tuning/notebook.ipynb)."
]
Expand Down Expand Up @@ -2689,7 +2689,7 @@
"metadata": {},
"source": [
"Next, we choose an optimization strategy from [this\n",
"list](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):"
"list](https://juliaai.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):"
]
},
{
Expand Down
25 changes: 12 additions & 13 deletions examples/telco/notebook.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# # MLJ for Data Scientists in Two Hours

# An application of the [MLJ
# toolbox](https://alan-turing-institute.github.io/MLJ.jl/dev/) to the
# toolbox](https://juliaai.github.io/MLJ.jl/dev/) to the
# Telco Customer Churn dataset, aimed at practicing data scientists
# new to MLJ (Machine Learning in Julia). This tutorial does not
# cover exploratory data analysis.
Expand All @@ -10,9 +10,9 @@
# deep-learning).

# For other MLJ learning resources see the [Learning
# MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_mlj/)
# MLJ](https://juliaai.github.io/MLJ.jl/dev/learning_mlj/)
# section of the
# [manual](https://alan-turing-institute.github.io/MLJ.jl/dev/).
# [manual](https://juliaai.github.io/MLJ.jl/dev/).

# **Topics covered**: Grabbing and preparing a dataset, basic
# fit/predict workflow, constructing a pipeline to include data
Expand Down Expand Up @@ -78,7 +78,7 @@
# the notebook, package instantiation and pre-compilation may take a
# minute or so to complete. **This step will fail** if the [correct
# Manifest.toml and Project.toml
# files](https://github.com/alan-turing-institute/MLJ.jl/tree/dev/examples/telco)
# files](https://github.com/JuliaAI/MLJ.jl/tree/dev/examples/telco)
# are not in the same directory as this notebook.

using Pkg
Expand All @@ -94,7 +94,7 @@ Pkg.instantiate()
# don't fully grasp should become clearer in the Telco study.

# This section is a condensed adaption of the [Getting Started
# example](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
# example](https://juliaai.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
# in the MLJ documentation.

# First, using the built-in iris dataset, we load and inspect the features
Expand Down Expand Up @@ -137,7 +137,7 @@ fit!(mach, rows=train_rows)
fitted_params(mach)

# A machine stores some other information enabling [warm
# restart](https://alan-turing-institute.github.io/MLJ.jl/dev/machines/#Warm-restarts)
# restart](https://juliaai.github.io/MLJ.jl/dev/machines/#Warm-restarts)
# for some models, but we won't go into that here. You are allowed to
# access and mutate the `model` parameter:

Expand Down Expand Up @@ -292,7 +292,7 @@ const ytest, Xtest = unpack(df_test, ==(:Churn), !=(:customerID));
# > Introduces: `@load`, `input_scitype`, `target_scitype`

# For tools helping us to identify suitable models, see the [Model
# Search](https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/#model_search)
# Search](https://juliaai.github.io/MLJ.jl/dev/model_search/#model_search)
# section of the manual. We will build a gradient tree-boosting model,
# a popular first choice for structured data like we have here. Model
# code is contained in a third-party package called
Expand Down Expand Up @@ -340,7 +340,7 @@ pipe = ContinuousEncoder() |> booster

# Note that the component models appear as hyperparameters of
# `pipe`. Pipelines are an implementation of a more general [model
# composition](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/#Composing-Models)
# composition](https://juliaai.github.io/MLJ.jl/dev/composing_models/#Composing-Models)
# interface provided by MLJ that advanced users may want to learn about.

# From the above display, we see that component model hyperparameters
Expand Down Expand Up @@ -464,7 +464,7 @@ plot!([0, 1], [0, 1], linewidth=2, linestyle=:dash, color=:black)
# `acceleration=CPUThreads()` to parallelize the computation.

# We choose a `StratifiedCV` resampling strategy; the complete list of options is
# [here](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).
# [here](https://juliaai.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).

e_pipe = evaluate(pipe, X, y,
resampling=StratifiedCV(nfolds=6, rng=123),
Expand Down Expand Up @@ -535,7 +535,7 @@ pipe2 = ContinuousEncoder() |>
# [MLJFlux.jl](https://github.com/FluxML/MLJFlux.jl).

# First, we select appropriate controls from [this
# list](https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):
# list](https://juliaai.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):

controls = [
Step(1), # to increment iteration parameter (`pipe.nrounds`)
Expand Down Expand Up @@ -580,7 +580,7 @@ fit!(mach_iterated_pipe);
# wanting to visualize the effect of changes to a *single*
# hyperparameter (which could be an iteration parameter). See, for
# example, [this section of the
# manual](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_curves/)
# manual](https://juliaai.github.io/MLJ.jl/dev/learning_curves/)
# or [this
# tutorial](https://github.com/ablaom/MLJTutorial.jl/blob/dev/notebooks/04_tuning/notebook.ipynb).

Expand Down Expand Up @@ -618,7 +618,7 @@ r2 = range(iterated_pipe, p2, lower=2, upper=6)
# and `upper`.

# Next, we choose an optimization strategy from [this
# list](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):
# list](https://juliaai.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):

tuning = RandomSearch(rng=123)

Expand Down Expand Up @@ -755,4 +755,3 @@ ŷ_basic = predict(mach_basic, Xtest);
auc(ŷ_basic, ytest),
accuracy(mode.(ŷ_basic), ytest)
)

36 changes: 18 additions & 18 deletions examples/telco/notebook.pluto.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ md"# MLJ for Data Scientists in Two Hours"
# ╔═╡ 8a6670b8-96a8-4a5d-b795-033f6f2a0674
md"""
An application of the [MLJ
toolbox](https://alan-turing-institute.github.io/MLJ.jl/dev/) to the
toolbox](https://juliaai.github.io/MLJ.jl/dev/) to the
Telco Customer Churn dataset, aimed at practicing data scientists
new to MLJ (Machine Learning in Julia). This tutorial does not
cover exploratory data analysis.
Expand All @@ -25,9 +25,9 @@ deep-learning).
# ╔═╡ b04c4790-59e0-42a3-af2a-25235e544a31
md"""
For other MLJ learning resources see the [Learning
MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_mlj/)
MLJ](https://juliaai.github.io/MLJ.jl/dev/learning_mlj/)
section of the
[manual](https://alan-turing-institute.github.io/MLJ.jl/dev/).
[manual](https://juliaai.github.io/MLJ.jl/dev/).
"""

# ╔═╡ 4eb8dff4-c23a-4b41-8af5-148d95ea2900
Expand Down Expand Up @@ -106,7 +106,7 @@ used to develop this tutorial. If this is your first time running
the notebook, package instantiation and pre-compilation may take a
minute or so to complete. **This step will fail** if the [correct
Manifest.toml and Project.toml
files](https://github.com/alan-turing-institute/MLJ.jl/tree/dev/examples/telco)
files](https://github.com/JuliaAI/MLJ.jl/tree/dev/examples/telco)
are not in the same directory as this notebook.
"""

Expand All @@ -131,7 +131,7 @@ don't fully grasp should become clearer in the Telco study.
# ╔═╡ 33ca287e-8cba-47d1-a0de-1721c1bc2df2
md"""
This section is a condensed adaption of the [Getting Started
example](https://alan-turing-institute.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
example](https://juliaai.github.io/MLJ.jl/dev/getting_started/#Fit-and-predict)
in the MLJ documentation.
"""

Expand Down Expand Up @@ -197,7 +197,7 @@ end
# ╔═╡ 0f978839-cc95-4c3a-8a29-32f11452654a
md"""
A machine stores some other information enabling [warm
restart](https://alan-turing-institute.github.io/MLJ.jl/dev/machines/#Warm-restarts)
restart](https://juliaai.github.io/MLJ.jl/dev/machines/#Warm-restarts)
for some models, but we won't go into that here. You are allowed to
access and mutate the `model` parameter:
"""
Expand Down Expand Up @@ -324,7 +324,7 @@ begin
return x
end
end

df0.TotalCharges = fix_blanks(df0.TotalCharges);
end

Expand Down Expand Up @@ -424,7 +424,7 @@ md"> Introduces: `@load`, `input_scitype`, `target_scitype`"
# ╔═╡ f97969e2-c15c-42cf-a6fa-eaf14df5d44b
md"""
For tools helping us to identify suitable models, see the [Model
Search](https://alan-turing-institute.github.io/MLJ.jl/dev/model_search/#model_search)
Search](https://juliaai.github.io/MLJ.jl/dev/model_search/#model_search)
section of the manual. We will build a gradient tree-boosting model,
a popular first choice for structured data like we have here. Model
code is contained in a third-party package called
Expand Down Expand Up @@ -497,7 +497,7 @@ pipe = ContinuousEncoder() |> booster
md"""
Note that the component models appear as hyperparameters of
`pipe`. Pipelines are an implementation of a more general [model
composition](https://alan-turing-institute.github.io/MLJ.jl/dev/composing_models/#Composing-Models)
composition](https://juliaai.github.io/MLJ.jl/dev/composing_models/#Composing-Models)
interface provided by MLJ that advanced users may want to learn about.
"""

Expand Down Expand Up @@ -693,7 +693,7 @@ observation space, for a total of 18 folds) and set
# ╔═╡ 562887bb-b7fb-430f-b61c-748aec38e674
md"""
We choose a `StratifiedCV` resampling strategy; the complete list of options is
[here](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).
[here](https://juliaai.github.io/MLJ.jl/dev/evaluating_model_performance/#Built-in-resampling-strategies).
"""

# ╔═╡ f9be989e-2604-44c2-9727-ed822e4fd85d
Expand Down Expand Up @@ -734,7 +734,7 @@ begin
table = (measure=measure, measurement=measurement)
return DataFrames.DataFrame(table)
end

const confidence_intervals_basic_model = confidence_intervals(e_pipe)
end

Expand All @@ -753,7 +753,7 @@ with low feature importance, to speed up later optimization:
# ╔═╡ cdfe840d-4e87-467f-b582-dfcbeb05bcc5
begin
unimportant_features = filter(:importance => <(0.005), feature_importance_table).feature

pipe2 = ContinuousEncoder() |>
FeatureSelector(features=unimportant_features, ignore=true) |> booster
end
Expand Down Expand Up @@ -790,7 +790,7 @@ eg, the neural network models provided by
# ╔═╡ 8fc99d35-d8cc-455f-806e-1bc580dc349d
md"""
First, we select appropriate controls from [this
list](https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):
list](https://juliaai.github.io/MLJ.jl/dev/controlling_iterative_models/#Controls-provided):
"""

# ╔═╡ 29f33708-4a82-4acc-9703-288eae064e2a
Expand Down Expand Up @@ -857,7 +857,7 @@ here is the `learning_curve` function, which can be useful when
wanting to visualize the effect of changes to a *single*
hyperparameter (which could be an iteration parameter). See, for
example, [this section of the
manual](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_curves/)
manual](https://juliaai.github.io/MLJ.jl/dev/learning_curves/)
or [this
tutorial](https://github.com/ablaom/MLJTutorial.jl/blob/dev/notebooks/04_tuning/notebook.ipynb).
"""
Expand Down Expand Up @@ -898,7 +898,7 @@ show(iterated_pipe, 2)
begin
p1 = :(model.evo_tree_classifier.η)
p2 = :(model.evo_tree_classifier.max_depth)

r1 = range(iterated_pipe, p1, lower=-2, upper=-0.5, scale=x->10^x)
r2 = range(iterated_pipe, p2, lower=2, upper=6)
end
Expand All @@ -912,7 +912,7 @@ and `upper`.
# ╔═╡ af3023e6-920f-478d-af76-60dddeecbe6c
md"""
Next, we choose an optimization strategy from [this
list](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):
list](https://juliaai.github.io/MLJ.jl/dev/tuning_models/#Tuning-Models):
"""

# ╔═╡ 93c17a9b-b49c-4780-9074-c069a0e97d7e
Expand Down Expand Up @@ -1105,9 +1105,9 @@ md"For comparison, here's the performance for the basic pipeline model"
begin
mach_basic = machine(pipe, X, y)
fit!(mach_basic, verbosity=0)

ŷ_basic = predict(mach_basic, Xtest);

@info("Basic model measurements on test set:",
brier_loss(ŷ_basic, ytest) |> mean,
auc(ŷ_basic, ytest),
Expand Down
Loading
Loading