TY - JOUR
T1 - DBNX: A Machine Learning Method for Ensembling Polygenic Risk Scores and Non-Genetic Factors
AU - Yuan, Xiangzhe
AU - Wang, Chonghao
AU - Zhu, Shuqin
AU - Zhang, Lu
N1 - L.Z. is supported by a Young Collaborative Research grant (No. C2004-23Y), HMRF grant (No. 11221026), Guangdong-Hong Kong Technology Cooperation Funding Scheme (GHX/133/20SZ), an IRCMS HKBU (No. IRCMS/19-20/D02) and a HKBU Start-up Grant Tier 2 (RC-SGT2/19-20/SCI/007).
Publisher copyright:
© 2025 IEEE.
PY - 2025/6/16
Y1 - 2025/6/16
N2 - Polygenic risk scoring (PRS) holds promise for improving disease prediction and medical treatments by evaluating an individual's genetic susceptibility through multiple genetic variants. However, current PRS calculation methods often excel only in specific diseases and populations, with no single approach consistently outperforming others across all contexts. Furthermore, these methods frequently overlook non-genetic factors, such as lifestyle, that also impact disease risk.We introduce an unsupervised Deep Belief Network (DBN) to aggregate PRS generated by various methods, achieving performance comparable to the Super Learner method'a supervised ensemble approach that combines predictions from multiple methods to improve outcomes. Unlike supervised methods, the DBN does not require training data and can directly ensemble the available PRS. Remarkably, on small-scale datasets, the DBN outperforms the Super Learner. Additionally, we present the DBNX model, which integrates PRS with non-genetic factors using a combination of DBN and XGBoost. DBNX produces a Composite Risk Score (CRS) that incorporates information from both PRS and non-genetic factors. In our experiments using the UK Biobank (UKBB) dataset across four diseases, DBNX demonstrated superior performance compared to other commonly used ensemble methods. The codes for DBNX are available at: https://github.com/Hangzen/DBNX
AB - Polygenic risk scoring (PRS) holds promise for improving disease prediction and medical treatments by evaluating an individual's genetic susceptibility through multiple genetic variants. However, current PRS calculation methods often excel only in specific diseases and populations, with no single approach consistently outperforming others across all contexts. Furthermore, these methods frequently overlook non-genetic factors, such as lifestyle, that also impact disease risk.We introduce an unsupervised Deep Belief Network (DBN) to aggregate PRS generated by various methods, achieving performance comparable to the Super Learner method'a supervised ensemble approach that combines predictions from multiple methods to improve outcomes. Unlike supervised methods, the DBN does not require training data and can directly ensemble the available PRS. Remarkably, on small-scale datasets, the DBN outperforms the Super Learner. Additionally, we present the DBNX model, which integrates PRS with non-genetic factors using a combination of DBN and XGBoost. DBNX produces a Composite Risk Score (CRS) that incorporates information from both PRS and non-genetic factors. In our experiments using the UK Biobank (UKBB) dataset across four diseases, DBNX demonstrated superior performance compared to other commonly used ensemble methods. The codes for DBNX are available at: https://github.com/Hangzen/DBNX
KW - DBN
KW - Polygenic Risk Scores
KW - Ensemble Learning
KW - Machine Learning
U2 - 10.1109/TCBBIO.2025.3580193
DO - 10.1109/TCBBIO.2025.3580193
M3 - Journal article
SN - 2998-4165
JO - IEEE Transactions on Computational Biology and Bioinformatics
JF - IEEE Transactions on Computational Biology and Bioinformatics
ER -