TY - JOUR
T1 - DBNX: A Machine Learning Method for Ensembling Polygenic Risk Scores and Non-Genetic Factors
AU - Yuan, Xiangzhe
AU - Wang, Chonghao
AU - Zhu, Shuqin
AU - Zhang, Lu
N1 - This work was supported in part by TCFS under Grant GHX/133/20SZ, in part by Young Collaborative Research under Grant C2004-23Y, and in part by HMRF seed under Grant 11221026. The work of Lu Zhang was supported in part by the Young Collaborative Research under Grant C2004-23Y, in part by HMRF under Grant 11221026, in part by Guangdong-Hong Kong Technology Cooperation Funding Scheme under Grant GHX/133/20SZ, in part by IRCMS HKBU under Grant IRCMS/19-20/D02, and in part by HKBU Start-up Grant Tier 2 under Grant RC-SGT2/19-20/SCI/007.
Publisher copyright:
© 2025 The Authors.
PY - 2025/9
Y1 - 2025/9
N2 - Polygenic risk scoring (PRS) holds promise for improving disease prediction and medical treatments by evaluating an individual’s genetic susceptibility through multiple genetic variants. However, current PRS calculation methods often excel only in specific diseases and populations, with no single approach consistently outperforming others across all contexts. Furthermore, these methods frequently overlook non-genetic factors, such as lifestyle, that also impact disease risk.We introduce an unsupervised Deep Belief Network (DBN) to aggregate PRS generated by various methods, achieving performance comparable to the Super Learner method—a supervised ensemble approach that combines predictions from multiple methods to improve outcomes. Unlike supervised methods, the DBN does not require training data and can directly ensemble the available PRS. Remarkably, on small-scale datasets, the DBN outperforms the Super Learner. Additionally, we present the DBNX model, which integrates PRS with non-genetic factors using a combination of DBN and XGBoost. DBNX produces a Composite Risk Score (CRS) that incorporates information from both PRS and non-genetic factors. In our experiments using the U.K. Biobank (UKBB) dataset across four diseases, DBNX demonstrated superior performance compared to other commonly used ensemble methods.
AB - Polygenic risk scoring (PRS) holds promise for improving disease prediction and medical treatments by evaluating an individual’s genetic susceptibility through multiple genetic variants. However, current PRS calculation methods often excel only in specific diseases and populations, with no single approach consistently outperforming others across all contexts. Furthermore, these methods frequently overlook non-genetic factors, such as lifestyle, that also impact disease risk.We introduce an unsupervised Deep Belief Network (DBN) to aggregate PRS generated by various methods, achieving performance comparable to the Super Learner method—a supervised ensemble approach that combines predictions from multiple methods to improve outcomes. Unlike supervised methods, the DBN does not require training data and can directly ensemble the available PRS. Remarkably, on small-scale datasets, the DBN outperforms the Super Learner. Additionally, we present the DBNX model, which integrates PRS with non-genetic factors using a combination of DBN and XGBoost. DBNX produces a Composite Risk Score (CRS) that incorporates information from both PRS and non-genetic factors. In our experiments using the U.K. Biobank (UKBB) dataset across four diseases, DBNX demonstrated superior performance compared to other commonly used ensemble methods.
KW - DBN
KW - Ensemble Learning
KW - Machine Learning
KW - Polygenic Risk Scores
UR - http://www.scopus.com/inward/record.url?scp=105019207097&partnerID=8YFLogxK
U2 - 10.1109/TCBBIO.2025.3580193
DO - 10.1109/TCBBIO.2025.3580193
M3 - Journal article
SN - 2998-4165
VL - 22
SP - 2031
EP - 2042
JO - IEEE Transactions on Computational Biology and Bioinformatics
JF - IEEE Transactions on Computational Biology and Bioinformatics
IS - 5
ER -