Abstract
To handle principal component analysis (PCA)-based missing data with high correlation, we propose a novel imputation algorithm to impute missing values, called iterated score regression. The procedure is first to draw into a transformation matrix, which puts missing values and observed values into two data blocks, and then by using the data blocks, the score matrix, and PCA model to construct the related regression equations. The estimation update at the iteration is highlighted. We examine the sensitivity of the proposed algorithm, including the effects of standard deviations, correlation coefficients, missing proportions, variable numbers, and sample sizes with different intervals of the standard deviations and correlation coefficients. To compare some existing algorithms, we suggest the modifications of three popularly used algorithms that are also used to deal with missing data but are not highly correlated. In the numerical studies we conducted, the MSE values of the algorithm, to show its stability and accuracy, are always the smallest among the competitors we consider. It also shows the advantage, as the illustration, for three real missing data sets.
| Original language | English |
|---|---|
| Article number | 9067 |
| Number of pages | 27 |
| Journal | Scientific Reports |
| Volume | 15 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 17 Mar 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 9 Industry, Innovation, and Infrastructure
User-Defined Keywords
- High correlation
- Iterated score regression
- Missing data
- Principal component analysis
- Sensitivity
Fingerprint
Dive into the research topics of 'The iterated score regression estimation algorithm for PCA-based missing data with high correlation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver