In the ultra-high dimensional setting, two popular variable screening methods with the desirable sure screening property are sure independence screening (SIS) and forward regression (FR). Both are classical variable screening methods, and recently have attracted greater attention under high-dimensional data analysis. We consider a new and simple screening method that incorporates multiple predictors at each step of forward regression, with decisions on which variables to incorporate based on the same criterion. If only one step is carried out, the new procedure reduces to SIS. Thus it can be regarded as a generalisation and unification of FR and SIS. More importantly, it preserves the sure screening property and has computational complexity similar to FR at each step, yet it can discover the relevant covariates in fewer steps. Thus it reduces the computational burden of FR drastically while retaining the advantages of the latter over SIS. Furthermore, we show that it can find all the true variables if the number of steps taken is the same as the correct model size, which is a new theoretical result even for the original FR. An extensive simulation study and application to two real data examples demonstrate excellent performance of the proposed method.
Scopus Subject Areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
- big data problems
- high-dimensional statistical inference
- model selection
- variable selection