How to evaluation imputation result by data which not all variable is significant related to y? #334

Bigsealion · 2021-01-13T07:07:44Z

Bigsealion
Jan 13, 2021

I have data X, which is a scale and contains some missing values. And, I have thousands of other variables, which is Y. I want to explore the relationships of each y and the whole X. According to your previous comments, I run the imputation thousands of times, and each time using X and one y.
Now, I want to compare different imputation methods (like pmm, randforest...) in my data by the method in https://stefvanbuuren.name/fimd/sec-evaluation.html. So I ran a simulation study to evaluate the parameter of a linear model, which is x to y.

Here is my question: Not all x is significant (p<0.05) in the linear model, so I worried that this will reduce the effectiveness of the evaluation. My strategy is to only evaluate the x that significantly. Is that acceptable? If the answer is no, what should I do?

Following is my detailed step:
First, bootstrap samples from complete x and y (y is a column vector, which is chosen from the whole Y).
Second, applying simulation missing pattern (which is from true pattern) to sample data.
Third, impute the simulation data by mice. (y had contained in the imputation model)
Fourth, build a linear model of X and y in simulation data and the whole complete data.
Fifth, compare the model parameter by Raw bias, Coverage rate, Average width, etc.

Thank you very much for your reply!

gerkovink · 2021-01-14T14:20:11Z

gerkovink
Jan 14, 2021
Maintainer

If I understand the aim of your analysis is not to evaluate a single analysis model, but to evaluate all univariate models for which the parameter is significant by an $\alpha = .05$ level.

First, studying the bias $\beta - \hat{\beta}$ the CI for $\hat\beta$ and the width of that specific CI seems to me as independent from the p-value for $\beta$ (or $\hat\beta$ for that matter). A highly significant parameter can easily be rendered confidence invalid (e.g. have a nominal coverage below $1 - \alpha$ level) when e.g. the imputation method fails to sufficiently address the statistical properties of the data, when the imputation model is misspecified or when the nonresponse is not properly addressed.

Let's assume that your model results in $\hat{y} = \beta\bf{X}$ and that $y - \hat{y} = \epsilon$

A good option to quantify the performance of the imputation approach when you do not wish to inspect every parameter for $\bf{X}$ is to limit your investigation to the fitted values for the outcome $\hat{y}$ . Study the bias, coverage rate and confidence interval width for this parameter, and you are done. All information about the predictors in $\bf{X}$ has been summarised in the fitted values. You could even shift your evaluation to the individual level and calculate the bootstrap prediction interval from e.g. m = 100 imputations and then use these to calculate the coverage rate of the true individual parameter.

Hope this helps.

All the best,

Gerko

0 replies

gerkovink · 2021-01-14T14:21:16Z

gerkovink
Jan 14, 2021
Maintainer

Closing as it is not related to mice

0 replies

Bigsealion · 2021-01-16T02:03:26Z

Bigsealion
Jan 16, 2021
Author

Thank you very much for your advice, which is very useful to me! And, I am very sorry for raising an inappropriate issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluation imputation result by data which not all variable is significant related to y? #334

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

How to evaluation imputation result by data which not all variable is significant related to y? #334

Bigsealion Jan 13, 2021

Replies: 3 comments

gerkovink Jan 14, 2021 Maintainer

gerkovink Jan 14, 2021 Maintainer

Bigsealion Jan 16, 2021 Author

Bigsealion
Jan 13, 2021

gerkovink
Jan 14, 2021
Maintainer

gerkovink
Jan 14, 2021
Maintainer

Bigsealion
Jan 16, 2021
Author