TY - JOUR
T1 - Robust model averaging prediction of longitudinal response with ultrahigh-dimensional covariates
AU - Jiang, Binyan
AU - Lv, Jing
AU - Li, Jialiang
AU - Cheng, Ming-Yen
N1 - This work is partially supported by the Natural Science Foundation of Chongqing (Grant No. cstc2021jcyj-msxmX0502, CSTB2022NSCQ-MSX0852), Fundamental Research Funds for the Central Universities (Grant No. SWUKU24002), the National Statistical Science Research Program (Grant No. 2022LY019), National Natural Science Foundation of China (Grant/Award Numbers: 72331005, 12071143), Ministry of Education Tier 1 Academic Research Funds A-8000016-00-00 and A-8001947-00-00 in Singapore, and the Research Grants Council grants HKBU12302621 and HKBU12302522.
Copyright:
© The Royal Statistical Society 2024.
PY - 2025/4
Y1 - 2025/4
N2 - Model averaging is an attractive ensemble technique to construct fast and accurate prediction. Despite of having been widely practiced in cross-sectional data analysis, its application to longitudinal data is rather limited so far. We consider model averaging for longitudinal response when the number of covariates is ultrahigh. To this end, we propose a novel two-stage procedure in which variable screening is first conducted and then followed by model averaging. In both stages, a robust rank-based estimation function is introduced to cope with potential outliers and heavy-tailed error distributions, while the longitudinal correlation is modelled by a modified Cholesky decomposition method and properly incorporated to achieve efficiency. Asymptotic properties of our proposed methods are rigorously established, including screening consistency and convergence of the model averaging predictor, with uncertainties in the screening step and selected model set both taken into account. Extensive simulation studies demonstrate that our method outperforms existing competitors, resulting in significant improvements in screening and prediction performance. Finally, we apply our proposed framework to analyse a human microbiome dataset, showing the capability of our procedure in resolving robust prediction using massive metabolites.
AB - Model averaging is an attractive ensemble technique to construct fast and accurate prediction. Despite of having been widely practiced in cross-sectional data analysis, its application to longitudinal data is rather limited so far. We consider model averaging for longitudinal response when the number of covariates is ultrahigh. To this end, we propose a novel two-stage procedure in which variable screening is first conducted and then followed by model averaging. In both stages, a robust rank-based estimation function is introduced to cope with potential outliers and heavy-tailed error distributions, while the longitudinal correlation is modelled by a modified Cholesky decomposition method and properly incorporated to achieve efficiency. Asymptotic properties of our proposed methods are rigorously established, including screening consistency and convergence of the model averaging predictor, with uncertainties in the screening step and selected model set both taken into account. Extensive simulation studies demonstrate that our method outperforms existing competitors, resulting in significant improvements in screening and prediction performance. Finally, we apply our proposed framework to analyse a human microbiome dataset, showing the capability of our procedure in resolving robust prediction using massive metabolites.
KW - longitudinal data analysis
KW - microbiome data analysis
KW - model averaging
KW - rank regression
KW - robust estimation
KW - sure screening
UR - http://www.scopus.com/inward/record.url?scp=105002662886&partnerID=8YFLogxK
U2 - 10.1093/jrsssb/qkae094
DO - 10.1093/jrsssb/qkae094
M3 - Journal article
SN - 1369-7412
VL - 87
SP - 337
EP - 361
JO - Journal of the Royal Statistical Society. Series B: Statistical Methodology
JF - Journal of the Royal Statistical Society. Series B: Statistical Methodology
IS - 2
ER -