A machine learning model for disease risk prediction by integrating genetic and non-genetic factors

Yu Xu, Chonghao Wang, Zeming Li, Yunpeng Cai, Ouzhou Young, Aiping Lyu, Lu Zhang*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

3 Citations (Scopus)

Abstract

Polygenic risk score (PRS) has been widely used to identify the high-risk individuals from the general population, which would be helpful for disease prevention and early treatment. Many methods have been developed to calculate PRS by weighting and aggregating the phenotype-associated risk alleles from genome-wide association studies. However, only considering genetic effects may not be sufficient for risk prediction because the disease risk is not only related to genetic factors but also non-genetic factors, e.g., diet, physical exercise et al. But it is still a challenge to integrate these genetic and non-genetic factors into a unified machine learning framework for disease risk prediction. In this paper, we proposed PRSIMD (PRS Integrating Multi-source Data), a machine learning model that applies posterior regularization to integrate genetic and non-genetic factors to improve disease risk prediction. Also, we applied Mendelian Randomization analysis to identify the causal non-genetic risk factors for the selected diseases. We applied PRSIMD to predict type 2 diabetes and coronary artery disease from UK Biobank and observed that PRSIMD was significantly better than the existing methods to calculate PRS. In addition, we observed that PRSIMD achieved the better predictive power than the composite risk score. The codes of PRSIMD are available at: https://github.com/ericcombiolab/PRSIMD

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022
EditorsDonald Adjeroh, Qi Long, Xinghua Mindy Shi, Fei Guo, Xiaohua Hu, Srinivas Aluru, Giri Narasimhan, Jianxin Wang, Mingon Kang, Ananda M. Mondal, Jin Liu
PublisherIEEE
Pages868-871
Number of pages4
ISBN (Electronic)9781665468190
ISBN (Print)9781665468206
DOIs
Publication statusPublished - Dec 2022
Event2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022 - Las Vegas, United States
Duration: 6 Dec 20228 Dec 2022
https://ieeexplore.ieee.org/xpl/conhome/9994793/proceeding

Publication series

NameProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM

Conference

Conference2022 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2022
Country/TerritoryUnited States
CityLas Vegas
Period6/12/228/12/22
Internet address

Scopus Subject Areas

  • Psychiatry and Mental health
  • Information Systems and Management
  • Biomedical Engineering
  • Medicine (miscellaneous)
  • Cardiology and Cardiovascular Medicine
  • Health Informatics

User-Defined Keywords

  • multi-source data
  • polygenic risk score
  • posterior regularization

Fingerprint

Dive into the research topics of 'A machine learning model for disease risk prediction by integrating genetic and non-genetic factors'. Together they form a unique fingerprint.

Cite this