There is a vast amount of work on high-dimensional regression. The common starting point for the existing theoretical work is to assume the data generating model is a homoscedastic linear regression model with some sparsity structure. In reality the homoscedasticity assumption is often violated, and hence understanding the heteroscedasticity of the data is of critical importance. In this article we systematically study the estimation of a high-dimensional heteroscedastic regression model. In particular, the emphasis is on how to detect and estimate the heteroscedasticity effects reliably and efficiently. To this end, we propose a cross-fitted residual regression approach and prove the resulting estimator is selection consistent for heteroscedasticity effects and establish its rates of convergence. Our estimator has tuning parameters to be determined by the data in practice. We propose a novel high-dimensional BIC for tuning parameter selection and establish its consistency. This is the first high-dimensional BIC result under heteroscedasticity. The theoretical analysis is more involved in order to handle heteroscedasticity, and we develop a couple of interesting new concentration inequalities that are of independent interests.
Scopus Subject Areas
- Statistics and Probability
- Statistics, Probability and Uncertainty
- High dimension
- Model selection criterion