Automatically predict NA for rows w/ NAs and learners that don't support missings #2099

mb706 · 2017-12-09T19:48:35Z

A comprehensive fix for the larger issue around #1515, this does what I described in my comment to #2068:
If the learner doesn't support 'missings', the rows containing missings are stripped, and NAs are added to the prediction in their place. This has two weaknesses:

Apparently there are Learners that don't support "missings" in their training data, but do support them in the prediction data. There is no easy fix for this, one might think about adding another Learner-property, or redefining "missings" to mean support for missings in the prediction data.
If every line of the input contains at least one NA, this falls back to the old prediction mode (and possibly creates an error if the Learner doesn't silently ignore NAs). A more thorough implementation could create the matrix / vector of NAs of appropriate type without calling predictLearner at all.

mb706 · 2017-12-09T19:49:12Z

R/generateHyperParsEffect.R

@@ -362,7 +362,7 @@ plotHyperParsEffect = function(hyperpars.effect.data, x = NULL, y = NULL,
          regr.task = makeRegrTask(id = "interp", data = d.run[, c(x, y, z)],
            target = z)
          mod = train(lrn, regr.task)
-          prediction = predict(mod, newdata = grid)
+          prediction = predict(mod, newdata = grid[c(x, y)])


Bonus bugfix!

larskotthoff · 2017-12-10T00:08:04Z

tests/testthat/test_base_predict.R

@@ -144,3 +144,11 @@ test_that("predict works with data.table as newdata", {
  expect_warning(predict(mod, newdata = data.table(iris)), regexp = "Provided data for prediction is not a pure data.frame but from class data.table, hence it will be converted.")
 })

+test_that("predict with NA rows for learners that don't support missings automatically returns NA", {
+  mod = train("classif.knn", pid.task)


Could you also add a test for the original random forest problem please?

larskotthoff · 2017-12-10T00:09:27Z

R/predictLearner.R

+
+removeNALines = function(newdata) {
+  namat = is.na(newdata)
+  if (!any(vlapply(namat, any))) {


Is this check necessary? As far as I can see the code after would do the right thing in this case as well.

I was wrong about what format the return of is.na(data.frame) has, I'll drop this part.

larskotthoff · 2017-12-10T19:47:15Z

Thanks, merging.

@larskotthoff

…ort missings (#2099) * predict NA if learner doesn't support that * adding test * drop = FALSE * bugfix * using old prediction as fallback when all rows are NA * implementing @larskotthoff's suggestions

mb706 added 5 commits December 9, 2017 14:49

predict NA if learner doesn't support that

1893188

adding test

eedb60d

drop = FALSE

98aba58

bugfix

6ec45a1

using old prediction as fallback when all rows are NA

1c635d4

mb706 added the pr-please review label Dec 9, 2017

mb706 commented Dec 9, 2017

View reviewed changes

larskotthoff requested changes Dec 10, 2017

View reviewed changes

implementing @larskotthoff's suggestions

c32c0b5

larskotthoff approved these changes Dec 10, 2017

View reviewed changes

larskotthoff merged commit ed18b6a into master Dec 10, 2017

larskotthoff deleted the automatically_predict_na branch December 10, 2017 19:47

larskotthoff pushed a commit that referenced this pull request Dec 10, 2017

NEWS for #2099

dbfa449

mb706 mentioned this pull request Dec 13, 2017

predict() input data may differ from data %>>% retrafo if property 'missings' not added mlr-org/mlrCPO#7

Open

zmjones pushed a commit that referenced this pull request Dec 19, 2017

NEWS for #2099

09dff84

ja-thomas mentioned this pull request Mar 2, 2018

Discuss the changes in #2099 #2204

Closed

This was referenced Jun 6, 2019

add check for missings in newdata #2068

Closed

Handling of missings (in train + predict) mlr-org/mlr3#238

Closed

Can't predict on randomForest when test set contains NA's in features #1515

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically predict NA for rows w/ NAs and learners that don't support missings #2099

Automatically predict NA for rows w/ NAs and learners that don't support missings #2099

mb706 commented Dec 9, 2017

mb706 Dec 9, 2017

larskotthoff Dec 10, 2017

larskotthoff Dec 10, 2017

mb706 Dec 10, 2017

larskotthoff commented Dec 10, 2017

Automatically predict NA for rows w/ NAs and learners that don't support missings #2099

Automatically predict NA for rows w/ NAs and learners that don't support missings #2099

Conversation

mb706 commented Dec 9, 2017

mb706 Dec 9, 2017

Choose a reason for hiding this comment

larskotthoff Dec 10, 2017

Choose a reason for hiding this comment

larskotthoff Dec 10, 2017

Choose a reason for hiding this comment

mb706 Dec 10, 2017

Choose a reason for hiding this comment

larskotthoff commented Dec 10, 2017