How to evaluation imputation result by data which not all variable is significant related to y? #334
Bigsealion
started this conversation in
Missing data methodology
Replies: 3 comments
-
Beta Was this translation helpful? Give feedback.
0 replies
-
Closing as it is not related to |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thank you very much for your advice, which is very useful to me! And, I am very sorry for raising an inappropriate issue. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have data X, which is a scale and contains some missing values. And, I have thousands of other variables, which is Y. I want to explore the relationships of each y and the whole X. According to your previous comments, I run the imputation thousands of times, and each time using X and one y.
Now, I want to compare different imputation methods (like pmm, randforest...) in my data by the method in https://stefvanbuuren.name/fimd/sec-evaluation.html. So I ran a simulation study to evaluate the parameter of a linear model, which is x to y.
Here is my question: Not all x is significant (p<0.05) in the linear model, so I worried that this will reduce the effectiveness of the evaluation. My strategy is to only evaluate the x that significantly. Is that acceptable? If the answer is no, what should I do?
Following is my detailed step:
First, bootstrap samples from complete x and y (y is a column vector, which is chosen from the whole Y).
Second, applying simulation missing pattern (which is from true pattern) to sample data.
Third, impute the simulation data by mice. (y had contained in the imputation model)
Fourth, build a linear model of X and y in simulation data and the whole complete data.
Fifth, compare the model parameter by Raw bias, Coverage rate, Average width, etc.
Thank you very much for your reply!
Beta Was this translation helpful? Give feedback.
All reactions