-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Dynamically Tune the Number of Layers in a Neural Network Using mlr3torch? #285
Comments
Hey @iLivius and thanks for your interest in the package! I have a suggestion that might work for you You can indirectly tune over the number of layers by defining 10 blocks (with their tuning parameters) and wrapping each of them in a Below is some pseudocode that illustrates the idea. po("branch_1", list(po("nn_linear_1", out_features = to_tune(...)) %>>% po("nn_relu_1"), po("nop_1")), selection = to_tune()) %>>%
po("branch_2", list(po("nn_linear_2", out_features = to_tune(...)) %>>% po("nn_relu_2"), po("nop_2"), selection = to_tune()) %>>%
...
po("branch_<n>", list(po("nn_linear_<n>", out_features = to_tune(...)) %>>% po("nn_relu_<n>"), po("nop_<n>"), selection = to_tune())) %>>%
po("nn_head") %>>%
... Let me know whether that works for you! |
Hi @sebffischer, Thank you for your prompt and helpful response. The Following your suggestion, I have tried to integrate your guidance into the following architecture, which seems to work for me. architecture <- po("torch_ingress_num") %>>%
# First branch
po("branch", options = c("block_1", "nop_1"), id = "branch_1", selection = to_tune()) %>>%
gunion(list(
block_1 = po("nn_linear", out_features = to_tune(p_int(32, 512)), id = "linear_1") %>>%
po("nn_relu", id = "relu_1") %>>%
po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)), id = "dropout_1"),
nop_1 = po("nop", id = "nop_1")
)) %>>% po("unbranch", id = "unbranch_1") %>>%
# Second branch
po("branch", options = c("block_2", "nop_2"), id = "branch_2", selection = to_tune()) %>>%
gunion(list(
block_2 = po("nn_linear", out_features = to_tune(p_int(32, 512)), id = "linear_2") %>>%
po("nn_relu", id = "relu_2") %>>%
po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)), id = "dropout_2"),
nop_2 = po("nop", id = "nop_2")
)) %>>% po("unbranch", id = "unbranch_2") %>>%
# Third branch
po("branch", options = c("block_3", "nop_3"), id = "branch_3", selection = to_tune()) %>>%
gunion(list(
block_3 = po("nn_linear", out_features = to_tune(p_int(32, 512)), id = "linear_3") %>>%
po("nn_relu", id = "relu_3") %>>%
po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)), id = "dropout_3"),
nop_3 = po("nop", id = "nop_3")
)) %>>% po("unbranch", id = "unbranch_3") %>>%
# Rest of the network
po("nn_head") %>>%
po("torch_loss", t_loss("cross_entropy")) %>>%
po("torch_optimizer", t_opt("adam", lr = to_tune(p_dbl(0.001, 0.01)))) %>>%
po("torch_model_classif", batch_size = 32, epochs = to_tune(p_int(100, 1000, tags = "budget")), device = "cpu", num_threads = 1) Please let me know if this implementation aligns with your vision and if it is acceptable in its current form? |
yes, this is what i had in mind! |
Hi @sebffischer, Thank you for your positive feedback. The approach for defining the architecture is working well so far. However, the tuning results show the INFO [11:35:55.338] [bbotk] Result:
INFO [11:35:55.338] [bbotk] branch_1.selection block_1.linear_1.out_features block_1.dropout_1.p branch_2.selection block_2.linear_2.out_features block_2.dropout_2.p branch_3.selection block_3.linear_3.out_features block_3.dropout_3.p
INFO [11:35:55.338] [bbotk] <char> <int> <num> <char> <int> <num> <char> <int> <num>
INFO [11:35:55.338] [bbotk] block_1 292 0.2882727 nop_2 315 0.1214148 nop_3 158 0.3017759 I’m wondering if the Thank you in advance for your insights! |
So if I understand you correctly, your question is why e.g. This parameter ( I think the problem is that the graph does not understand that the Maybe we want this on the issuetracker of |
Thank you for the explanation, @sebffischer! |
what i noticed / forgot to mention is that the approach that i have suggested here uses a non-uniform distribution for the number of layers which might not be what you want. To solve this you could write down your search space using sorry that i did not mention this before |
would be covered by this issue I guess? mlr-org/mlr3pipelines#101 |
@iLivius You could also build something like this library("mlr3pipelines")
library("mlr3torch")
library("mlr3tuning")
block <- po("nn_linear", out_features = to_tune(p_int(32, 512))) %>>%
po("nn_relu") %>>%
po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)))
numblocks <- 5
graph = NULL
for (i in seq_len(numblocks - 1)) {
unbranch_id <- paste0("unbranch_", i)
graph <- gunion(list(
po(unbranch_id, 2),
graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", i))
), in_place = TRUE)
graph$add_edge(sprintf("%s_%s", block$rhs, i), unbranch_id, dst_channel = "input2")
}
graph = po("branch", numblocks) %>>!% graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", numblocks)) The graph$edges
#> src_id src_channel dst_id dst_channel
#> <char> <char> <char> <char>
#> 1: nn_linear_1 output nn_relu_1 input
#> 2: nn_relu_1 output nn_dropout_1 input
#> 3: nn_dropout_1 output unbranch_1 input2
#> 4: nn_linear_2 output nn_relu_2 input
#> 5: nn_relu_2 output nn_dropout_2 input
#> 6: unbranch_1 output nn_linear_2 input
#> 7: nn_dropout_2 output unbranch_2 input2
#> 8: nn_linear_3 output nn_relu_3 input
#> 9: nn_relu_3 output nn_dropout_3 input
#> 10: unbranch_2 output nn_linear_3 input
#> 11: nn_dropout_3 output unbranch_3 input2
#> 12: nn_linear_4 output nn_relu_4 input
#> 13: nn_relu_4 output nn_dropout_4 input
#> 14: unbranch_3 output nn_linear_4 input
#> 15: nn_dropout_4 output unbranch_4 input2
#> 16: branch output1 unbranch_4 input1
#> 17: branch output2 unbranch_3 input1
#> 18: branch output3 unbranch_2 input1
#> 19: branch output4 unbranch_1 input1
#> 20: branch output5 nn_linear_1 input
#> 21: nn_linear_5 output nn_relu_5 input
#> 22: nn_relu_5 output nn_dropout_5 input
#> 23: unbranch_4 output nn_linear_5 input
#> src_id src_channel dst_id dst_channel You still need to set hyperparameter dependencies manually, unfortunately. You can use library("mlr3pipelines")
library("mlr3torch")
library("mlr3tuning")
block <- po("nn_linear") %>>%
po("nn_relu") %>>%
po("nn_dropout")
numblocks <- 5
graph = NULL
for (i in seq_len(numblocks - 1)) {
unbranch_id <- paste0("unbranch_", i)
curblock <- block$clone(deep = TRUE)
curblock$param_set$set_values(
nn_linear.out_features = to_tune(p_int(32, 512, depends = branch.selection %in% (numblocks - i + 1):numblocks)),
nn_dropout.p = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% (numblocks - i + 1):numblocks))
)
curblock$update_ids(postfix = paste0("_", i))
graph <- gunion(list(
po(unbranch_id, 2),
graph %>>!% curblock
), in_place = TRUE)
graph$add_edge(sprintf("%s_%s", block$rhs, i), unbranch_id, dst_channel = "input2")
}
graph = po("branch", numblocks, selection = to_tune()) %>>!% graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", numblocks)) which is basically the same as doing graph$param_set$set_values(
branch.selection = to_tune(),
nn_linear_1.out_features = to_tune(p_int(32, 512, depends = branch.selection %in% 5)),
nn_dropout_1.p = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 5)),
nn_linear_2.out_features = to_tune(p_int(32, 512, depends = branch.selection %in% 4:5)),
nn_dropout_2.p = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 4:5)),
nn_linear_3.out_features = to_tune(p_int(32, 512, depends = branch.selection %in% 3:5)),
nn_dropout_3.p = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 3:5)),
nn_linear_4.out_features = to_tune(p_int(32, 512, depends = branch.selection %in% 2:5)),
nn_dropout_4.p = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 2:5)),
nn_linear_5.out_features = to_tune(p_int(32, 512, depends = branch.selection %in% 1:5)),
nn_dropout_5.p = to_tune(p_dbl(0.1, 0.5, depends = branch.selection %in% 1:5))
) You can run generate_design_random(graph$param_set$search_space(), 3)$transpose() to see a few sample configurations that this generates, to verify that only the relevant hyperparameters are set. |
(If you want to have fewer hyperparameters, e.g. have a single |
Thank you both for taking the game to the next level! I have tried to implement the solution suggested by @sebffischer, using # Define the maximum number of layers
max_layers <- 5
# Define the search space
search_space <- ps(
n_layers = p_int(1, max_layers),
.extra_trafo = function(x, param_set) {
for (i in 1:max_layers) {
if (i <= x$n_layers) {
x[[paste0("block_", i, "_selection")]] <- "on" # Activate the layer
} else {
x[[paste0("block_", i, "_selection")]] <- "off" # Deactivate the layer
}
}
return(x)
}
)
# Function to generate neural network layers with tunable parameters
generate_branch <- function(layer_num, block_selection) {
if (block_selection == "on") {
branch <- po("nn_linear", out_features = to_tune(p_int(32, 512)), id = paste0("linear_", layer_num)) %>>%
po("nn_relu", id = paste0("relu_", layer_num)) %>>%
po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)), id = paste0("dropout_", layer_num))
} else {
branch <- po("nop", id = paste0("nop_", layer_num))
}
return(branch)
}
# Function to build the entire architecture based on the number of layers and block selections
generate_architecture <- function(n_layers, block_selections) {
architecture <- po("torch_ingress_num")
# Loop over the layers and add them to the architecture based on selections
for (i in 1:max_layers) {
architecture <- architecture %>>% generate_branch(i, block_selections[[paste0("block_", i, "_selection")]])
}
# Add the rest of the network
architecture <- architecture %>>%
po("nn_head") %>>%
po("torch_loss", t_loss("cross_entropy")) %>>%
po("torch_optimizer", t_opt("adam", lr = to_tune(p_dbl(0.001, 0.01)))) %>>%
po("torch_model_classif", batch_size = 32, epochs = to_tune(p_int(100, 1000, tags = "budget")), device = "cpu", num_threads = 1)
return(architecture)
}
# Apply the transformation to extract the tuned values
trafo_params <- search_space$trafo(list(
n_layers = 3 # Example value for the number of layers
))
# Generate the architecture based on the tuned number of layers and block selections
architecture <- generate_architecture(trafo_params$n_layers, trafo_params) This is how the tuned instance looks like, based on the INFO [09:25:35.375] [bbotk] Finished optimizing after 8 evaluation(s)
INFO [09:25:35.377] [bbotk] Result:
INFO [09:25:35.383] [bbotk] linear_1.out_features dropout_1.p linear_2.out_features dropout_2.p linear_3.out_features dropout_3.p torch_optimizer.lr torch_model_classif.epochs learner_param_vals x_domain classif.bacc
INFO [09:25:35.383] [bbotk] <int> <num> <int> <num> <int> <num> <num> <num> <list> <list> <num>
INFO [09:25:35.383] [bbotk] 283 0.2134847 43 0.3523094 510 0.4388965 0.002550026 125 <list[17]> <list[8]> 0.9851852
linear_1.out_features dropout_1.p linear_2.out_features dropout_2.p linear_3.out_features dropout_3.p torch_optimizer.lr torch_model_classif.epochs learner_param_vals x_domain classif.bacc
<int> <num> <int> <num> <int> <num> <num> <num> <list> <list> <num>
1: 283 0.2134847 43 0.3523094 510 0.4388965 0.002550026 125 <list[17]> <list[8]> 0.9851852 Here below I tried to wrap up the solution provided by @mb706 in a reproducible example, please let me know if I nailed it: library("future")
library("mlr3hyperband")
library("mlr3pipelines")
library("mlr3torch")
library("mlr3tuning")
### Use built-in iris dataset for simplicity
tab <- iris
colnames(tab)[which(names(tab) == "Species")] <- "target"
tab$target <- as.factor(tab$target)
### Initialize classification task
task <- TaskClassif$new(id = "iris", backend = tab, target = "target")
# Build the graph object with tunable layers
block <- po("nn_linear", out_features = to_tune(p_int(32, 512))) %>>%
po("nn_relu") %>>%
po("nn_dropout", p = to_tune(p_dbl(0.1, 0.5)))
numblocks <- 5
graph = NULL
for (i in seq_len(numblocks - 1)) {
unbranch_id <- paste0("unbranch_", i)
graph <- gunion(list(
po(unbranch_id, 2),
graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", i))
), in_place = TRUE)
graph$add_edge(sprintf("%s_%s", block$rhs, i), unbranch_id, dst_channel = "input2")
}
graph = po("branch", numblocks) %>>!% graph %>>!% block$clone(deep = TRUE)$update_ids(postfix = paste0("_", numblocks))
# **Add torch_ingress_num to preprocess the input data**
graph <- po("torch_ingress_num") %>>% graph
# Add the final components for classification
graph <- graph %>>%
po("nn_head") %>>%
po("torch_loss", t_loss("cross_entropy")) %>>%
po("torch_optimizer", t_opt("adam", lr = to_tune(p_dbl(0.001, 0.01)))) %>>%
po("torch_model_classif", batch_size = 32, epochs = to_tune(p_int(100, 1000, tags = "budget")), device = "cpu", num_threads = 1)
# Convert the architecture to a learner object
learner <- as_learner(graph)
learner$id <- "iris"
learner$predict_type <- "prob"
# Set resampling
kfold = 5
inner_resampling <- rsmp("cv", folds = kfold)
# Define the tuning terminator and instance for optimization
terminator <- trm("evals", n_evals = 5)
instance <- ti(
task = task, learner = learner, resampling = inner_resampling,
measure = msr("classif.bacc"), terminator = terminator
)
# Optimize using Hyperband tuner
future::plan(multisession, workers = 24)
tuner <- tnr("hyperband", eta = 2, repetitions = 1)
tuner$optimize(instance) The tuned instance looks like the following: INFO [09:50:09.572] [bbotk] Finished optimizing after 8 evaluation(s)
INFO [09:50:09.574] [bbotk] Result:
INFO [09:50:09.579] [bbotk] nn_linear_1.out_features nn_dropout_1.p nn_linear_2.out_features nn_dropout_2.p nn_linear_3.out_features nn_dropout_3.p nn_linear_4.out_features nn_dropout_4.p nn_linear_5.out_features nn_dropout_5.p
INFO [09:50:09.579] [bbotk] <int> <num> <int> <num> <int> <num> <int> <num> <int> <num>
INFO [09:50:09.579] [bbotk] 93 0.4843554 453 0.3147442 78 0.3420649 329 0.387233 494 0.3151132
INFO [09:50:09.579] [bbotk] torch_optimizer.lr torch_model_classif.epochs learner_param_vals x_domain classif.bacc
INFO [09:50:09.579] [bbotk] <num> <num> <list> <list> <num>
INFO [09:50:09.579] [bbotk] 0.002098667 125 <list[22]> <list[12]> 0.9833333
nn_linear_1.out_features nn_dropout_1.p nn_linear_2.out_features nn_dropout_2.p nn_linear_3.out_features nn_dropout_3.p nn_linear_4.out_features nn_dropout_4.p nn_linear_5.out_features nn_dropout_5.p
<int> <num> <int> <num> <int> <num> <int> <num> <int> <num>
1: 93 0.4843554 453 0.3147442 78 0.3420649 329 0.387233 494 0.3151132
torch_optimizer.lr torch_model_classif.epochs learner_param_vals x_domain classif.bacc
<num> <num> <list> <list> <num>
1: 0.002098667 125 <list[22]> <list[12]> 0.9833333 |
If you have such a custom parameter, the whole search space needs to be defined as a This means that the search space would look something like: search_space <- ps(
branch_1.nn_linear_1.out_features = p_int(100, 200),
branch_1.nn_dropout_1.p = p_dbl(0, 1),
...
branch_n.nn_linear_n.out_features = p_int(100, 200),
branch_n.nn_dropout_n.p = p_dbl(0, 1),
...
n_layers = p_int(1, max_layers),
.extra_trafo = function(x, param_set) {
for (i in 1:max_layers) {
if (i <= x$n_layers) {
x[[paste0("block_", i, "_selection")]] <- "on" # Activate the layer
} else {
x[[paste0("block_", i, "_selection")]] <- "off" # Deactivate the layer
}
}
return(x)
}
) It is a bit more cumbersome to write down the search spaces like this, but it gives you also more flexibility. (Also on an unrelated note, |
I think it would be good to add a solution to this problem as a predefined graph so that it is available via |
I would like to build a neural network with a tunable number of layers. While I can tune the number of neurons per layer, I’m encountering issues when it comes to dynamically changing the number of layers.
Initially, I thought I could handle this using
po("nn_block")
. However, I understood thatnn_block
is more suited for repeating a segment of the network multiple times. My goal is to be able to tune the number of layers, from 1 to a maximum value, while maintaining the ability to tune the number of neurons in each layer.Here’s a minimal reproducible example that demonstrates my current approach:
stackoverflow
The text was updated successfully, but these errors were encountered: