NAs left in mice output without any loggedEvents #349
Replies: 10 comments 2 replies
-
After some more playing around, the issue seems to come from the fact that I did not impute the auxiliary variables. If I add methods to them, I get rid of the missing data, but the running time doubles (which is an issue, given that my data has more than 5mn rows and the first imputation attempt took 4 days to run). Is there any alternative that just uses the available information in the auxiliary variables? If there is not, it might be worth clarifying that in the documentation? |
Beta Was this translation helpful? Give feedback.
-
The All the best, Gerko |
Beta Was this translation helpful? Give feedback.
-
Thanks, Gerko. I was thinking along those lines and probably just need to impute all variables, but still don't quite get it. I have 50% missing data in I am considering setting a more efficient predictorMatrix, primarily by dropping categorical variables with many levels as predictors, but find it hard to come up with a theoretical rationale for that ... will have another look through the vignettes and through published articles in my field though. |
Beta Was this translation helpful? Give feedback.
-
Related #263 |
Beta Was this translation helpful? Give feedback.
-
@LukasWallrich I think the reason why you don't see 50% missingness in your inputed variables is because only those values stay missing where there was missingness to begin with (e.g. 16 rows for You can run the following code to check this: # Create a logical vector of length nrow(input) that is TRUE if any auxilliary variable in a row is missing
any_aux_missing <- with(
input,
is.na(test) | is.na(IAT_score) | is.na(att_7) | is.na(t_diff)
)
# For each variable that is imputed, check how many rows that had a missing value also had
# a missing auxilliary variable
for(col in names(method)[method != ""]){
n_joint_missing <- sum(is.na(input[[col]]) & any_aux_missing)
cat(col, ":", n_joint_missing, "\n")
} |
Beta Was this translation helpful? Give feedback.
-
Thank you, @prockenschaub, that's very helpful. Now I finally understand what's going on. |
Beta Was this translation helpful? Give feedback.
-
@stefvanbuuren Given how similar my mistake was to #263 I might suggest adding a note that auxiliary variables should be imputed to the explanation of the method argument (or to the vignette). As it stands, I understood the possibility to exclude variables from being imputed as a quick-win to reduce runtime ... |
Beta Was this translation helpful? Give feedback.
-
I have added to the
|
Beta Was this translation helpful? Give feedback.
-
Hello, I have been running into this issue, as with the OP and similar to post #263. However, I am wondering why this is the case that one cannot skip a given variable for imputation (column A in Dr. Van Buuren's post) but also use it as an auxiliary/predictor variable AND avoid the problem of having NAs in imputed variables (column B) when the skipped variable (column A) contains NAs. This seems to unnecessarily restrict the kinds of imputation models that can be built. So I have been puzzling over getting this code to run prior to reading these posts here, but I am now just wondering why this is the case in mice. Dr. Van Buuren, if you are able/have time, could you explain the reasoning behind not allowing an item to be skipped, included in the imputation model along with other predictors, and contain missing values? Many thanks, Ian |
Beta Was this translation helpful? Give feedback.
-
Hi there, I ran into a similar problem, so I left my question in this thread. After the multiple imputation (pmm method), there are still missing values in my dataset (although the number of missing values was reduced). I have checked that there was no issue with constant value or multicollinearity as there was no logged event. I have included most auxiliary variables in the multiple imputation. I removed 3 auxiliary variables earlier due to the presence of logged events. But after such removal, there were no logged events. I have also checked that no variables/columns were completely empty, whereas there were about 7 participants who did not answer any part of the survey (so about 7 rows were completely empty). There are 14 variables in the main analyses and 10 auxiliary variables. All of them were included in the multiple imputation. All of them contain missing values. All variables in the main analyses are continuous. For auxiliary variables, 6 are categorical and 4 are continuous. The categorical variables were coded as factors in r. I wonder why there were still missing values? Is this normal? Can anyone please advise how can I get a complete imputed dataset? If not, can I proceed to multiple mediation analysis with those missing values? I used this code for the multiple imputation: Please see this link to part of my dataset: https://drive.google.com/file/d/1s_KNTSp4NlxvLYKhVWSPfYbBf0EeniXx/view?usp=drive_link I've also checked out the following discussion, but they don't seem to have the relevant answer for my situation. Thank you for your time and help in advance!! |
Beta Was this translation helpful? Give feedback.
-
I am trying to impute missing values in a dataset and am left with a rather large share of missing values. I tried to find previous issues and SO questions, and all seemed to be either related to logged events or could be fixed with remove_collinear = FALSE ... I get no logged events and remove_collinear has no effect so that I am stuck and think there might be a bug in mice - at least with regard to the absence of loggedEvents?
Given that I am trying to impute categorical variables with many categories, I can't make a very small reproducible example. However, this dataset with 500 lines works: https://drive.google.com/file/d/1n_U-BYBU-nJVar2D_5FkeOJf6ARIKrbK/view?usp=sharing. Each line has at least 3 non-missing values, yet in the output, I get NAs in each variable that I am trying to impute.
I'd be very grateful for any suggestions regarding how to get a complete imputed dataset.
Beta Was this translation helpful? Give feedback.
All reactions