-
Hi, I am trying to do classification on a TRUE/FALSE class. Before running on the data set, I clean all the dataset so that it includes only compete cases, however, when trying to optimise I got an error regarding missing values. Here is a sample code: d <- fread("datasets/training_data.csv")
# Filter training data
training_data <- d[,
c( predictors_cols), with = F]
xgb_learner <- lrn("classif.xgboost",
eval_metric = "logloss")
traintask <- TaskClassif$new(id = "training_data",
backend = training_data[, -remove_col,
with = FALSE],
target = target)
train_indexes <- list(train = c(1, 2, 3, 4, 5), test = c(6, 7, 8, 9, 10))
# Set parameter space
XGB_parameters <- ps(
eta = p_dbl(default = 0.05, lower = 0.001, upper = 0.1),
max_depth = p_int(default = 6L, lower = 3L, upper = 15L),
nrounds = p_int(default = 50L, lower = 5L, upper = 100L),
gamma = p_dbl(default = 7, lower = 4, upper = 17),
colsample_bytree = p_dbl(
default = 0.15, lower = 0.05, upper = 0.25),
subsample = p_dbl(default = 0.15, lower = 0.01, upper = 0.25),
min_child_weight = p_dbl(default = 1, lower = 0, upper = 3),
booster = p_fct(levels = c("dart")),
# Parameters specific for DART
rate_drop = p_dbl(default = 0, lower = 0, upper = 1, tags = "train"),
skip_drop = p_dbl(default = 0, lower = 0, upper = 1, tags = "train")
)
XGB_parameters$add_dep("skip_drop", "booster", CondEqual$new("dart"))
XGB_parameters$add_dep("rate_drop", "booster", CondEqual$new("dart"))
XGB_parameters
tuner <- tnr("random_search")
rc <- rsmp("custom")
rc$instantiate(traintask, train_indexes$training_seqs,
train_indexes$training_seqs)
measure <- msr("classif.fbeta")
term_combo <- trm("combo",
list(
trm("evals", n_evals = 500),
trm("perf_reached", level = 0.9)),
any = TRUE)
# Instantiate tuning
instance <- TuningInstanceSingleCrit$new(
task = traintask,
learner = xgb_learner,
resampling = rc,
measure = measure,
search_space = XGB_parameters,
terminator = term_combo
)
tuner$optimize(instance)
Error in .__Archive__add_evals(self = self, private = private, super = super, :
Assertion on 'ydt[, self$cols_y, with = FALSE]' failed: Contains missing values (column 'classif.fbeta', row 1).
For some reason, when I change the metring to |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
The error states that the measure Reproducible example library(mlr3)
library(mlr3learners)
set.seed(2)
task = tsk("pima")
learner = lrn("classif.xgboost", eval_metric = "logloss")
indices = list(train = list(1, 2, 3, 4, 5), test = list(6, 7, 8, 9, 10))
rc = rsmp("custom")
rc$instantiate(task, indices$train, indices$test)
rr = resample(task, learner, rc)
rr$score(msr("classif.fbeta")) |
Beta Was this translation helpful? Give feedback.
-
The small custom resampling might be the cause. |
Beta Was this translation helpful? Give feedback.
-
I think I found the source of the problem. I have a very unbalanced dataset thus, some of the cross-validation resamples contain only T or only F labels. That means it is not possible to calculate the measurement. Here is a reproducible example: library(mlr3learners)
library(data.table)
set.seed(2)
d <- data.table(x = rnorm(100),
label = sample(c(T, F), size = 100, T, prob = c(0.05, 0.95)))
d[, label := as.factor(label)]
traintask <- TaskClassif$new(id = "d",
backend = d,
target = "label")
learner = lrn("classif.xgboost", eval_metric = "logloss")
indices = list(train = list(1, 2, 3, 4, 5), test = list(6, 7, 8, 9, 10))
rc = rsmp("custom")
rc$instantiate(traintask, indices$train, indices$test)
rr = resample(traintask, learner, rc)
rr$score(msr("classif.fbeta")) The output shows several NaN:
|
Beta Was this translation helpful? Give feedback.
-
@be-marc Can you improve the error message? |
Beta Was this translation helpful? Give feedback.
-
Okay, I think I have been able to solve it, but I would like to hear if you think it makes sense. What I have done to add the argument msr("classif.sensitivity", na_value = 0)
However, I am not sure if I should use 0, or 1 as NA substitution. At least the hyperparameter tunning is running normally. |
Beta Was this translation helpful? Give feedback.
-
Well, it depends. Here are some good reads on this topic:
There are (at least) two ways to tackle this problem:
|
Beta Was this translation helpful? Give feedback.
Well, it depends. Here are some good reads on this topic:
There are (at least) two ways to tackle this problem:
msr("classif.f1", average = "micro")
.