TY - JOUR
T1 - Novel machine learning models outperform risk scores in predicting hepatocellular carcinoma in patients with chronic viral hepatitis
AU - Wong, Grace Lai Hung
AU - Hui, Vicki Wing-Ki
AU - Tan, Qingxiong
AU - Xu, Jingwen
AU - LEE, Hye Won
AU - Yip, Terry Cheuk-Fung
AU - Yang, Baoyao
AU - Tse, Yee Kit
AU - Yin, Chong
AU - Lyu, Fei
AU - Lai, Jimmy Che-To
AU - Lui, Grace Chung-Yan
AU - Chan, Henry Lik-Yuen
AU - Yuen, Pong Chi
AU - Wong, Vincent Wai-Sun
N1 - HADCL provided data, tools, platforms, health informatics, and professional support. All data were anonymised and not identifiable.
© 2022 The Author(s). Published by Elsevier B.V. on behalf of European Association for the Study of the Liver (EASL).
Funding Information:
GW has served as an advisory committee member for Gilead Sciences and Janssen; has served as a speaker for Abbott, Abbvie, Bristol-Myers Squibb, Echosens, Furui, Gilead Sciences, Janssen, and Roche; and has received a research grant from Gilead Sciences. TY has served as a speaker and an advisory committee member for Gilead Sciences. GL has served as an advisory committee member for Gilead, speaker for Merck and Gilead, and received a research grant from Gilead. HC is an advisor for AbbVie, Aligos, Aptorum, Arbutus, Hepion, Janssen, Gilead, GSK, Merck, Roche, Vaccitech, VenatoRx, and Vir Biotechnology; and a speaker for Mylan, Gilead, and Roche. VW has served as an advisory committee member for AbbVie, Allergan, Echosens, Gilead Sciences, Janssen, Perspectum Diagnostics, Pfizer, and Terns and as a speaker for Bristol-Myers Squibb, Echosens, Gilead Sciences, and Merck. The other authors declare that they have no competing interests.
Funding Information:
This work was supported by the Health and Medical Research Fund (HMRF) of the Food and Health Bureau (reference no.: 07180216 ) awarded to GW.
Publisher Copyright:
© 2022 The Author(s)
PY - 2022/3/1
Y1 - 2022/3/1
N2 - Background & Aims: Accurate hepatocellular carcinoma (HCC) risk prediction facilitates appropriate surveillance strategy and reduces cancer mortality. We aimed to derive and validate novel machine learning models to predict HCC in a territory-wide cohort of patients with chronic viral hepatitis (CVH) using data from the Hospital Authority Data Collaboration Lab (HADCL). Methods: This was a territory-wide, retrospective, observational, cohort study of patients with CVH in Hong Kong in 2000–2018 identified from HADCL based on viral markers, diagnosis codes, and antiviral treatment for chronic hepatitis B and/or C. The cohort was randomly split into training and validation cohorts in a 7:3 ratio. Five popular machine learning methods, namely, logistic regression, ridge regression, AdaBoost, decision tree, and random forest, were performed and compared to find the best prediction model. Results: A total of 124,006 patients with CVH with complete data were included to build the models. In the training cohort (n = 86,804; 6,821 HCC), ridge regression (area under the receiver operating characteristic curve [AUROC] 0.842), decision tree (0.952), and random forest (0.992) performed the best. In the validation cohort (n = 37,202; 2,875 HCC), ridge regression (AUROC 0.844) and random forest (0.837) maintained their accuracy, which was significantly higher than those of HCC risk scores: CU-HCC (0.672), GAG-HCC (0.745), REACH-B (0.671), PAGE-B (0.748), and REAL-B (0.712) scores. The low cut-off (0.07) of HCC ridge score (HCC-RS) achieved 90.0% sensitivity and 98.6% negative predictive value (NPV) in the validation cohort. The high cut-off (0.15) of HCC-RS achieved high specificity (90.0%) and NPV (95.6%); 31.1% of patients remained indeterminate. Conclusions: HCC-RS from the ridge regression machine learning model accurately predicted HCC in patients with CVH. These machine learning models may be developed as built-in functional keys or calculators in electronic health systems to reduce cancer mortality. Lay summary: Novel machine learning models generated accurate risk scores for hepatocellular carcinoma (HCC) in patients with chronic viral hepatitis. HCC ridge score was consistently more accurate than existing HCC risk scores. These models may be incorporated into electronic medical health systems to develop appropriate cancer surveillance strategies and reduce cancer death.
AB - Background & Aims: Accurate hepatocellular carcinoma (HCC) risk prediction facilitates appropriate surveillance strategy and reduces cancer mortality. We aimed to derive and validate novel machine learning models to predict HCC in a territory-wide cohort of patients with chronic viral hepatitis (CVH) using data from the Hospital Authority Data Collaboration Lab (HADCL). Methods: This was a territory-wide, retrospective, observational, cohort study of patients with CVH in Hong Kong in 2000–2018 identified from HADCL based on viral markers, diagnosis codes, and antiviral treatment for chronic hepatitis B and/or C. The cohort was randomly split into training and validation cohorts in a 7:3 ratio. Five popular machine learning methods, namely, logistic regression, ridge regression, AdaBoost, decision tree, and random forest, were performed and compared to find the best prediction model. Results: A total of 124,006 patients with CVH with complete data were included to build the models. In the training cohort (n = 86,804; 6,821 HCC), ridge regression (area under the receiver operating characteristic curve [AUROC] 0.842), decision tree (0.952), and random forest (0.992) performed the best. In the validation cohort (n = 37,202; 2,875 HCC), ridge regression (AUROC 0.844) and random forest (0.837) maintained their accuracy, which was significantly higher than those of HCC risk scores: CU-HCC (0.672), GAG-HCC (0.745), REACH-B (0.671), PAGE-B (0.748), and REAL-B (0.712) scores. The low cut-off (0.07) of HCC ridge score (HCC-RS) achieved 90.0% sensitivity and 98.6% negative predictive value (NPV) in the validation cohort. The high cut-off (0.15) of HCC-RS achieved high specificity (90.0%) and NPV (95.6%); 31.1% of patients remained indeterminate. Conclusions: HCC-RS from the ridge regression machine learning model accurately predicted HCC in patients with CVH. These machine learning models may be developed as built-in functional keys or calculators in electronic health systems to reduce cancer mortality. Lay summary: Novel machine learning models generated accurate risk scores for hepatocellular carcinoma (HCC) in patients with chronic viral hepatitis. HCC ridge score was consistently more accurate than existing HCC risk scores. These models may be incorporated into electronic medical health systems to develop appropriate cancer surveillance strategies and reduce cancer death.
KW - Antiviral treatment
KW - Cirrhosis
KW - Liver cancer
KW - Mortality
KW - World Health Organization
UR - http://www.scopus.com/inward/record.url?scp=85124251425&partnerID=8YFLogxK
U2 - 10.1016/j.jhepr.2022.100441
DO - 10.1016/j.jhepr.2022.100441
M3 - Journal article
SN - 2589-5559
VL - 4
JO - JHEP Reports
JF - JHEP Reports
IS - 3
M1 - 100441
ER -