TY - JOUR
T1 - On distribution-weighted partial least squares with diverging number of highly correlated predictors
AU - Zhu, Li Ping
AU - ZHU, Lixing
N1 - Copyright:
Copyright 2009 Elsevier B.V., All rights reserved.
PY - 2009/4
Y1 - 2009/4
N2 - Because highly correlated data arise from many scientific fields, we investigate parameter estimation in a semiparametric regression model with diverging number of predictors that are highly correlated. For this, we first develop a distribution-weighted least squares estimator that can recover directions in the central subspace, then use the distribution-weighted least squares estimator as a seed vector and project it onto a Krylov space by partial least squares to avoid computing the inverse of the covariance of predictors. Thus, distrbution-weighted partial least squares can handle the cases with high dimensional and highly correlated predictors. Furthermore, we also suggest an iterative algorithm for obtaining a better initial value before implementing partial least squares. For theoretical investigation, we obtain strong consistency and asymptotic normality when the dimension p of predictors is of convergence rate O[n1/2/ log (n)] and o(n1/3) respectively where n is the sample size. When there are no other constraints on the covariance of predictors, the rates n1/2 and n1/3 are optimal. We also propose a Bayesian information criterion type of criterion to estimate the dimension of the Krylov space in the partial least squares procedure. Illustrative examples with a real data set and comprehensive simulations demonstrate that the method is robust to non-ellipticity and works well even in 'small n-large p' problems.
AB - Because highly correlated data arise from many scientific fields, we investigate parameter estimation in a semiparametric regression model with diverging number of predictors that are highly correlated. For this, we first develop a distribution-weighted least squares estimator that can recover directions in the central subspace, then use the distribution-weighted least squares estimator as a seed vector and project it onto a Krylov space by partial least squares to avoid computing the inverse of the covariance of predictors. Thus, distrbution-weighted partial least squares can handle the cases with high dimensional and highly correlated predictors. Furthermore, we also suggest an iterative algorithm for obtaining a better initial value before implementing partial least squares. For theoretical investigation, we obtain strong consistency and asymptotic normality when the dimension p of predictors is of convergence rate O[n1/2/ log (n)] and o(n1/3) respectively where n is the sample size. When there are no other constraints on the covariance of predictors, the rates n1/2 and n1/3 are optimal. We also propose a Bayesian information criterion type of criterion to estimate the dimension of the Krylov space in the partial least squares procedure. Illustrative examples with a real data set and comprehensive simulations demonstrate that the method is robust to non-ellipticity and works well even in 'small n-large p' problems.
KW - Central subspace
KW - Collinearity
KW - Distribution function
KW - Inverse regression
KW - Least squares estimation
KW - Partial least squares
UR - http://www.scopus.com/inward/record.url?scp=62849115699&partnerID=8YFLogxK
U2 - 10.1111/j.1467-9868.2008.00697.x
DO - 10.1111/j.1467-9868.2008.00697.x
M3 - Journal article
AN - SCOPUS:62849115699
SN - 1369-7412
VL - 71
SP - 525
EP - 548
JO - Journal of the Royal Statistical Society. Series B: Statistical Methodology
JF - Journal of the Royal Statistical Society. Series B: Statistical Methodology
IS - 2
ER -