Abstract
Model averaging is an attractive ensemble technique to construct fast and accurate prediction. Despite of having been widely practiced in cross-sectional data analysis, its application to longitudinal data is rather limited so far. We consider model averaging for longitudinal response when the number of covariates is ultrahigh. To this end, we propose a novel two-stage procedure in which variable screening is first conducted and then followed by model averaging. In both stages, a robust rank-based estimation function is introduced to cope with potential outliers and heavy-tailed error distributions, while the longitudinal correlation is modelled by a modified Cholesky decomposition method and properly incorporated to achieve efficiency. Asymptotic properties of our proposed methods are rigorously established, including screening consistency and convergence of the model averaging predictor, with uncertainties in the screening step and selected model set both taken into account. Extensive simulation studies demonstrate that our method outperforms existing competitors, resulting in significant improvements in screening and prediction performance. Finally, we apply our proposed framework to analyse a human microbiome dataset, showing the capability of our procedure in resolving robust prediction using massive metabolites.
Original language | English |
---|---|
Article number | qkae094 |
Number of pages | 25 |
Journal | Journal of the Royal Statistical Society. Series B: Statistical Methodology |
DOIs | |
Publication status | E-pub ahead of print - 11 Sept 2024 |
User-Defined Keywords
- longitudinal data analysis
- microbiome data analysis
- model averaging
- rank regression
- robust estimation
- sure screening