The iterated score regression estimation algorithm for PCA-based missing data with high correlation

Guangbao Guo*, Haoyue Song, Lixing Zhu

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

To handle principal component analysis (PCA)-based missing data with high correlation, we propose a novel imputation algorithm to impute missing values, called iterated score regression. The procedure is first to draw into a transformation matrix, which puts missing values and observed values into two data blocks, and then by using the data blocks, the score matrix, and PCA model to construct the related regression equations. The estimation update at the iteration is highlighted. We examine the sensitivity of the proposed algorithm, including the effects of standard deviations, correlation coefficients, missing proportions, variable numbers, and sample sizes with different intervals of the standard deviations and correlation coefficients. To compare some existing algorithms, we suggest the modifications of three popularly used algorithms that are also used to deal with missing data but are not highly correlated. In the numerical studies we conducted, the MSE values of the algorithm, to show its stability and accuracy, are always the smallest among the competitors we consider. It also shows the advantage, as the illustration, for three real missing data sets.

Original languageEnglish
Article number9067
Number of pages27
JournalScientific Reports
Volume15
Issue number1
DOIs
Publication statusPublished - 17 Mar 2025

User-Defined Keywords

  • High correlation
  • Iterated score regression
  • Missing data
  • Principal component analysis
  • Sensitivity

Fingerprint

Dive into the research topics of 'The iterated score regression estimation algorithm for PCA-based missing data with high correlation'. Together they form a unique fingerprint.

Cite this