Abstract
In this paper, we investigate two-stage ranking-selection procedures for ultra-high dimensional data in the framework of goodness-of-fit testing. We develop a k-step marginal F-test (MFTk) screening in the first stage. The MFT1 is, as a statistic, equivalent to that used in the sure independence screening (SIS) proposed by Fan and Lv (2008). The MFTk with k≥2 makes improvement over the MFT1 mainly on better handling correlations among predictors. For selecting a more parsimonious working model in the first stage, we propose a soft threshold cutoff through a sequential goodness-of-fit testing. This avoids some drawbacks of the hard threshold cutoff in Fan and Lv (2008) and the extended BIC used in Wang (2009). In the second stage, we develop one-step backward screening to further remove those insignificant predictors from the model. Further, likewise as the iterative SIS, we provide the iterative versions of the proposed procedures to have more accurate variable selection. Extensive numerical studies and real data analysis are carried out to examine the performance of our proposed procedures.
Original language | English |
---|---|
Pages (from-to) | 148-164 |
Number of pages | 17 |
Journal | Journal of Statistical Planning and Inference |
Volume | 145 |
DOIs | |
Publication status | Published - Feb 2014 |
Scopus Subject Areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Applied Mathematics
User-Defined Keywords
- Backward screening
- Linear model
- Marginal effect
- Sequential goodness-of-fit testing