## Abstract

Because highly correlated data arise from many scientific fields, we investigate parameter estimation in a semiparametric regression model with diverging number of predictors that are highly correlated. For this, we first develop a distribution-weighted least squares estimator that can recover directions in the central subspace, then use the distribution-weighted least squares estimator as a seed vector and project it onto a Krylov space by partial least squares to avoid computing the inverse of the covariance of predictors. Thus, distrbution-weighted partial least squares can handle the cases with high dimensional and highly correlated predictors. Furthermore, we also suggest an iterative algorithm for obtaining a better initial value before implementing partial least squares. For theoretical investigation, we obtain strong consistency and asymptotic normality when the dimension p of predictors is of convergence rate O[n^{1/2}/ log (n)] and o(n^{1/3}) respectively where n is the sample size. When there are no other constraints on the covariance of predictors, the rates n^{1/2} and n^{1/3} are optimal. We also propose a Bayesian information criterion type of criterion to estimate the dimension of the Krylov space in the partial least squares procedure. Illustrative examples with a real data set and comprehensive simulations demonstrate that the method is robust to non-ellipticity and works well even in 'small n-large p' problems.

Original language | English |
---|---|

Pages (from-to) | 525-548 |

Number of pages | 24 |

Journal | Journal of the Royal Statistical Society. Series B: Statistical Methodology |

Volume | 71 |

Issue number | 2 |

DOIs | |

Publication status | Published - Apr 2009 |

## Scopus Subject Areas

- Statistics and Probability
- Statistics, Probability and Uncertainty

## User-Defined Keywords

- Central subspace
- Collinearity
- Distribution function
- Inverse regression
- Least squares estimation
- Partial least squares