TY - JOUR
T1 - Model-free feature screening for ultrahigh-dimensional data
AU - Zhu, Li Ping
AU - Li, Lexin
AU - Li, Runze
AU - ZHU, Lixing
N1 - Funding Information:
Li-Ping Zhu is Associate Professor, School of Statistics and Management, Shanghai University of Finance and Economics (E-mail: zhu.liping@mail. shufe.edu.cn). Lexin Li is Associate Professor, Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203 (E-mail: [email protected]). Runze Li is the corresponding author and Professor, Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111 (E-mail: [email protected]). Li-Xing Zhu is Chair Professor of Statistics, Department of Mathematics, Hong Kong Baptist University, Hong Kong (E-mail: [email protected]). Li-Ping Zhu’s research was supported by National Natural Science Foundation of China grant 11071077 and National Institute on Drug Abuse (NIDA) grant R21-DA024260. Lexin Li’s research was supported by NSF grant DMS 1106668. Runze Li’s research was supported by NSF grant DMS 0348869, National Natural Science Foundation of China grant 11028103, and National Institute on Drug Abuse (NIDA) grant P50-DA10075. Li-Xing Zhu’s research was supported by Research Grants Council of Hong Kong grant HKBU2034/09P. The authors are grateful to Dr. Yichao Wu for sharing the ideas through personal communication about the iterative screening approach presented in this article. The authors thank the editor, the associate editor, and reviewers for their suggestions, which have helped greatly improve the article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NSF or NIDA.
PY - 2011/12
Y1 - 2011/12
N2 - With the recent explosion of scientific data of unprecedented size and complexity, feature ranking and screening are playing an increasingly important role in many scientific studies. In this article, we propose a novel feature screening procedure under a unified model framework, which covers a wide variety of commonly used parametric and semiparametric models. The new method does not require imposing a specific model structure on regression functions, and thus is particularly appealing to ultrahigh-dimensional regressions, where there are a huge number of candidate predictors but little information about the actual model forms. We demonstrate that, with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. The new procedure is computationally efficient and simple, and exhibits a competent empirical performance in our intensive simulations and real data analysis.
AB - With the recent explosion of scientific data of unprecedented size and complexity, feature ranking and screening are playing an increasingly important role in many scientific studies. In this article, we propose a novel feature screening procedure under a unified model framework, which covers a wide variety of commonly used parametric and semiparametric models. The new method does not require imposing a specific model structure on regression functions, and thus is particularly appealing to ultrahigh-dimensional regressions, where there are a huge number of candidate predictors but little information about the actual model forms. We demonstrate that, with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. The new procedure is computationally efficient and simple, and exhibits a competent empirical performance in our intensive simulations and real data analysis.
KW - Feature ranking
KW - Ultrahigh-dimensional regression
KW - Variable selection
UR - http://www.scopus.com/inward/record.url?scp=84862928752&partnerID=8YFLogxK
U2 - 10.1198/jasa.2011.tm10563
DO - 10.1198/jasa.2011.tm10563
M3 - Journal article
AN - SCOPUS:84862928752
SN - 0162-1459
VL - 106
SP - 1464
EP - 1475
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 496
ER -