TY - JOUR
T1 - Faecal microbiome-based machine learning for multi-class disease diagnosis
AU - Su, Qi
AU - Liu, Qin
AU - Lau, Raphaela Iris
AU - Zhang, Jingwan
AU - Xu, Zhilu
AU - Yeoh, Yun Kit
AU - Leung, Thomas W.H.
AU - Tang, Whitney
AU - Zhang, Lin
AU - Liang, Jessie Q.Y.
AU - Yau, Yuk Kam
AU - Zheng, Jiaying
AU - Liu, Chengyu
AU - Zhang, Mengjing
AU - Cheung, Chun Pan
AU - Ching, Jessica Y.L.
AU - Tun, Hein M.
AU - Yu, Jun
AU - Chan, Francis K.L.
AU - Ng, Siew C.
N1 - Funding Information:
We thank Gabriel Lee for manuscript proofreading. We thank Anki Miu, Bonaventure YM Ip, Joyce Wing Yan Mak, Paul KS Chan and other clinical research staff/students for their technical contribution to this study, including clinical data and sample collection, inventory and processing. This research has been conducted using the CU-Med Biobank Resource under Request ID ‘R20221008’. Q.S., Q.L., J.Z., Z.X., Y.K.Y., W.T., L.Z., Y.K.Y., C.L., M.Z., C.P.C., H.M.T., F.K.L.C. and S.C.N are partially or fully supported by InnoHK, The Government of Hong Kong, Special Administrative Region of the People’s Republic of China. S.C.N. is also supported by the Croucher Senior Medical Research Fellowship. R.I.L. received additional support from the Hong Kong Ph.D. Fellowship Scheme (HKPFS).
Publisher Copyright:
© The Author(s) 2022
PY - 2022/11/10
Y1 - 2022/11/10
N2 - Systemic characterisation of the human faecal microbiome provides the opportunity to develop non-invasive approaches in the diagnosis of a major human disease. However, shared microbial signatures across different diseases make accurate diagnosis challenging in single-disease models. Herein, we present a machine-learning multi-class model using faecal metagenomic dataset of 2,320 individuals with nine well-characterised phenotypes, including colorectal cancer, colorectal adenomas, Crohn’s disease, ulcerative colitis, irritable bowel syndrome, obesity, cardiovascular disease, post-acute COVID-19 syndrome and healthy individuals. Our processed data covers 325 microbial species derived from 14.3 terabytes of sequence. The trained model achieves an area under the receiver operating characteristic curve (AUROC) of 0.90 to 0.99 (Interquartile range, IQR, 0.91–0.94) in predicting different diseases in the independent test set, with a sensitivity of 0.81 to 0.95 (IQR, 0.87–0.93) at a specificity of 0.76 to 0.98 (IQR 0.83–0.95). Metagenomic analysis from public datasets of 1,597 samples across different populations observes comparable predictions with AUROC of 0.69 to 0.91 (IQR 0.79–0.87). Correlation of the top 50 microbial species with disease phenotypes identifies 363 significant associations (FDR < 0.05). This microbiome-based multi-disease model has potential clinical application in disease diagnostics and treatment response monitoring and warrants further exploration.
AB - Systemic characterisation of the human faecal microbiome provides the opportunity to develop non-invasive approaches in the diagnosis of a major human disease. However, shared microbial signatures across different diseases make accurate diagnosis challenging in single-disease models. Herein, we present a machine-learning multi-class model using faecal metagenomic dataset of 2,320 individuals with nine well-characterised phenotypes, including colorectal cancer, colorectal adenomas, Crohn’s disease, ulcerative colitis, irritable bowel syndrome, obesity, cardiovascular disease, post-acute COVID-19 syndrome and healthy individuals. Our processed data covers 325 microbial species derived from 14.3 terabytes of sequence. The trained model achieves an area under the receiver operating characteristic curve (AUROC) of 0.90 to 0.99 (Interquartile range, IQR, 0.91–0.94) in predicting different diseases in the independent test set, with a sensitivity of 0.81 to 0.95 (IQR, 0.87–0.93) at a specificity of 0.76 to 0.98 (IQR 0.83–0.95). Metagenomic analysis from public datasets of 1,597 samples across different populations observes comparable predictions with AUROC of 0.69 to 0.91 (IQR 0.79–0.87). Correlation of the top 50 microbial species with disease phenotypes identifies 363 significant associations (FDR < 0.05). This microbiome-based multi-disease model has potential clinical application in disease diagnostics and treatment response monitoring and warrants further exploration.
UR - http://www.scopus.com/inward/record.url?scp=85141570060&partnerID=8YFLogxK
U2 - 10.1038/s41467-022-34405-3
DO - 10.1038/s41467-022-34405-3
M3 - Journal article
C2 - 36357393
SN - 2041-1723
VL - 13
JO - Nature Communications
JF - Nature Communications
M1 - 6818
ER -