TY - JOUR
T1 - Greedy forward regression for variable screening
AU - Cheng, Ming Yen
AU - Feng, Sanying
AU - Li, Gaorong
AU - Lian, Heng
N1 - Funding Information:
The authors sincerely thank the Handling Editor, Professor Michael Martin, the Associate Editor, and two anonymous reviewers for their insightful comments and suggestions that greatly improved the paper. Cheng's research was supported by the Ministry of Science and Technology grant 104-2118-M-002-005-MY3. Lian's research was supported by City University of Hong Kong Start-up Grant 7200521. Feng's research was supported by the National Natural Science Foundation of China (11501522). Li's research was supported by the National Natural Science Foundation of China (11471029), the Beijing Natural Science Foundation (1142002), the Science and Technology Project of Beijing Municipal Education Commission (KM201410005010) and the Program for JingHua Talents in Beijing University of Technology (2013-JH-L07).
Funding Information:
*Author to whom correspondence should be addressed. 1Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong. e-mail: [email protected] 2School of Mathematics and Statistics, Zhengzhou University, Zhengzhou, 450001, China 3Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, China4 Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong Acknowledgment. The authors sincerely thank the Handling Editor, Professor Michael Martin, the Associate Editor, and two anonymous reviewers for their insightful comments and suggestions that greatly improved the paper. Cheng’s research was supported by the Ministry of Science and Technology grant 104-2118-M-002-005-MY3. Lian’s research was supported by City University of Hong Kong Start-up Grant 7200521. Feng’s research was supported by the National Natural Science Foundation of China (11501522). Li’s research was supported by the National Natural Science Foundation of China (11471029), the Beijing Natural Science Foundation (1142002), the Science and Technology Project of Beijing Municipal Education Commission (KM201410005010) and the Program for JingHua Talents in Beijing University of Technology (2013-JH-L07).
PY - 2018/3
Y1 - 2018/3
N2 - In the ultra-high dimensional setting, two popular variable screening methods with the desirable sure screening property are sure independence screening (SIS) and forward regression (FR). Both are classical variable screening methods, and recently have attracted greater attention under high-dimensional data analysis. We consider a new and simple screening method that incorporates multiple predictors at each step of forward regression, with decisions on which variables to incorporate based on the same criterion. If only one step is carried out, the new procedure reduces to SIS. Thus it can be regarded as a generalisation and unification of FR and SIS. More importantly, it preserves the sure screening property and has computational complexity similar to FR at each step, yet it can discover the relevant covariates in fewer steps. Thus it reduces the computational burden of FR drastically while retaining the advantages of the latter over SIS. Furthermore, we show that it can find all the true variables if the number of steps taken is the same as the correct model size, which is a new theoretical result even for the original FR. An extensive simulation study and application to two real data examples demonstrate excellent performance of the proposed method.
AB - In the ultra-high dimensional setting, two popular variable screening methods with the desirable sure screening property are sure independence screening (SIS) and forward regression (FR). Both are classical variable screening methods, and recently have attracted greater attention under high-dimensional data analysis. We consider a new and simple screening method that incorporates multiple predictors at each step of forward regression, with decisions on which variables to incorporate based on the same criterion. If only one step is carried out, the new procedure reduces to SIS. Thus it can be regarded as a generalisation and unification of FR and SIS. More importantly, it preserves the sure screening property and has computational complexity similar to FR at each step, yet it can discover the relevant covariates in fewer steps. Thus it reduces the computational burden of FR drastically while retaining the advantages of the latter over SIS. Furthermore, we show that it can find all the true variables if the number of steps taken is the same as the correct model size, which is a new theoretical result even for the original FR. An extensive simulation study and application to two real data examples demonstrate excellent performance of the proposed method.
KW - big data problems
KW - high-dimensional statistical inference
KW - model selection
KW - variable selection
UR - http://www.scopus.com/inward/record.url?scp=85044007312&partnerID=8YFLogxK
U2 - 10.1111/anzs.12218
DO - 10.1111/anzs.12218
M3 - Journal article
AN - SCOPUS:85044007312
SN - 1369-1473
VL - 60
SP - 20
EP - 42
JO - Australian and New Zealand Journal of Statistics
JF - Australian and New Zealand Journal of Statistics
IS - 1
ER -