quickpred seemingly not detecting strong correlations & looking for faster way to specify "include" variable list #496

ivmcphail · 2022-07-16T02:11:02Z

ivmcphail
Jul 16, 2022

As per subject, in running quickpred, the imputation run seems to miss correlations that are substantially higher than what I've set "mincor" to be.

For example, setting mincor = .2, a number of variables are left out of the imputation model, as per the predictor matrix. For instance, Variables A & B both correlate with Variable C at r > .40 are left out of the model when Variable C is being imputed. Variable C shares > .90 proportion usable with both Variable A & B. Here is the code I have used:

ini <- mice(rtc.pre,
predictorMatrix = quickpred(rtc.pre,
mincor = .2,
minpuc = 0.25,
method = "spearman"),
seed = 1234,
printFlag = FALSE)

With this discrepancy, I am curious whether I am somehow doing something wrong/not understanding how to properly use the mincor or minpuc functions or what might be causing this discrepancy for me. Thanks in advance

Second point is that I am looking for a quicker way to specify the variables to always include in the imputations. For instance, say I have 5 outcomes I want to make sure get included in the imputation, using quickpre, I have the following, which works:

ini <- mice(rtc.pre, method = meth.pre, predictorMatrix = quickpred(rtc.pre,
mincor = .2,
minpuc = 0.25,
include = c("viorec", "genrec", "sexrec", "anyrec", "sexvio"),
method = "spearman"),
seed = 1234,
printFlag = FALSE)

My admittedly shallow knowledge of R coding conventions leads me to change this to the following, let's assume these 5 outcome variables occupy columns 6 to 10 in the "rtc.pre" dataset:

ini <- mice(rtc.pre, method = meth.pre, predictorMatrix = quickpred(rtc.pre,
mincor = .2,
minpuc = 0.25,
include = c(6:10),
method = "spearman"),
seed = 1234,
printFlag = FALSE)

While the imputation runs without getting stopped by an error, these 5 variables do not appear as predictors for most of the imputed variables in the dataset (as per the predictor matrix). Any direction would be very much appreciated, if only to let me know that there is not a way to shorten up how one identifies variables to always include in the imputation model.

Ian.

Answered by thomvolker

Jul 17, 2022

If you install the reprex package, you can simply copy all your (relevant) code to your clipboard (i.e., [ctrl + c] the code you use), and subsequently run reprex::reprex() in the R-console (make sure that this selection of code runs in your R environment, otherwise reprex::reprex() will just throw errors at you). This creates an entirely reproducible script, containing both the input from the script, and the output.

Hope this helps, because on the basis of a single line of code, it is near impossible to identify what goes wrong with quickpred().

With regard to your second problem, you could use colnames(rtc.pre)[6:10], which creates a vector of containing the 6th to 10th column name.

View full answer

gerkovink · 2022-07-16T04:07:09Z

gerkovink
Jul 16, 2022
Maintainer

Can you provide a reprex()?

1 reply

ivmcphail Jul 16, 2022
Author

I don't really know how to paste the code in from R so that it retains its reprex() look. Apologies for that, coding is not my area of expertise. The below is the closest I can figure it out:

mice(rtc.pre, predictorMatrix = quickpred(rtc.pre, mincor = 0.2, minpuc = 0.25, method = "spearman"), seed = 1234, printFlag = FALSE)

thomvolker · 2022-07-17T11:39:30Z

thomvolker
Jul 17, 2022

If you install the reprex package, you can simply copy all your (relevant) code to your clipboard (i.e., [ctrl + c] the code you use), and subsequently run reprex::reprex() in the R-console (make sure that this selection of code runs in your R environment, otherwise reprex::reprex() will just throw errors at you). This creates an entirely reproducible script, containing both the input from the script, and the output.

Hope this helps, because on the basis of a single line of code, it is near impossible to identify what goes wrong with quickpred().

With regard to your second problem, you could use colnames(rtc.pre)[6:10], which creates a vector of containing the 6th to 10th column name.

1 reply

ivmcphail Oct 13, 2022
Author

Hi, sorry for my late response, but thank you for the advice and ideas for getting past my problem!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quickpred seemingly not detecting strong correlations & looking for faster way to specify "include" variable list #496

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

quickpred seemingly not detecting strong correlations & looking for faster way to specify "include" variable list #496

ivmcphail Jul 16, 2022

Replies: 2 comments · 2 replies

gerkovink Jul 16, 2022 Maintainer

ivmcphail Jul 16, 2022 Author

thomvolker Jul 17, 2022

ivmcphail Oct 13, 2022 Author

ivmcphail
Jul 16, 2022

Replies: 2 comments 2 replies

gerkovink
Jul 16, 2022
Maintainer

ivmcphail Jul 16, 2022
Author

thomvolker
Jul 17, 2022

ivmcphail Oct 13, 2022
Author