Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically predict NA for rows w/ NAs and learners that don't support missings #2099

Merged
merged 6 commits into from
Dec 10, 2017

Conversation

mb706
Copy link
Contributor

@mb706 mb706 commented Dec 9, 2017

A comprehensive fix for the larger issue around #1515, this does what I described in my comment to #2068:
If the learner doesn't support 'missings', the rows containing missings are stripped, and NAs are added to the prediction in their place. This has two weaknesses:

  1. Apparently there are Learners that don't support "missings" in their training data, but do support them in the prediction data. There is no easy fix for this, one might think about adding another Learner-property, or redefining "missings" to mean support for missings in the prediction data.
  2. If every line of the input contains at least one NA, this falls back to the old prediction mode (and possibly creates an error if the Learner doesn't silently ignore NAs). A more thorough implementation could create the matrix / vector of NAs of appropriate type without calling predictLearner at all.

@@ -362,7 +362,7 @@ plotHyperParsEffect = function(hyperpars.effect.data, x = NULL, y = NULL,
regr.task = makeRegrTask(id = "interp", data = d.run[, c(x, y, z)],
target = z)
mod = train(lrn, regr.task)
prediction = predict(mod, newdata = grid)
prediction = predict(mod, newdata = grid[c(x, y)])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus bugfix!

@@ -144,3 +144,11 @@ test_that("predict works with data.table as newdata", {
expect_warning(predict(mod, newdata = data.table(iris)), regexp = "Provided data for prediction is not a pure data.frame but from class data.table, hence it will be converted.")
})

test_that("predict with NA rows for learners that don't support missings automatically returns NA", {
mod = train("classif.knn", pid.task)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test for the original random forest problem please?


removeNALines = function(newdata) {
namat = is.na(newdata)
if (!any(vlapply(namat, any))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? As far as I can see the code after would do the right thing in this case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong about what format the return of is.na(data.frame) has, I'll drop this part.

@larskotthoff
Copy link
Member

Thanks, merging.

@larskotthoff larskotthoff merged commit ed18b6a into master Dec 10, 2017
@larskotthoff larskotthoff deleted the automatically_predict_na branch December 10, 2017 19:47
larskotthoff pushed a commit that referenced this pull request Dec 10, 2017
zmjones pushed a commit that referenced this pull request Dec 19, 2017
…ort missings (#2099)

* predict NA if learner doesn't support that

* adding test

* drop = FALSE

* bugfix

* using old prediction as fallback when all rows are NA

* implementing @larskotthoff's suggestions
zmjones pushed a commit that referenced this pull request Dec 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants