TY - GEN
T1 - Quantifying predictive capability of electronic health records for the most harmful breast cancer
AU - Wu, Yirong
AU - FAN, Jun
AU - Peissig, Peggy
AU - Berg, Richard
AU - Tafti, Ahmad Pahlavan
AU - Yin, Jie
AU - Yuan, Ming
AU - Page, David
AU - Cox, Jennifer
AU - Burnside, Elizabeth S.
N1 - Funding Information:
The authors acknowledge the support of NIH grants U54AI117924, K24CA194251 and the NIH NCATS grant (UL1TR000427). We also acknowledge support from the University of Wisconsin Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation and the University of Wisconsin Carbone Comprehensive Cancer Center (P30CA014520).
PY - 2018
Y1 - 2018
N2 - Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and LassoLR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.
AB - Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and LassoLR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.
KW - breast cancer
KW - electronic health records (EHRs)
KW - least absolute shrinkage and selection operator (Lasso)
KW - regularized prediction model
UR - http://www.scopus.com/inward/record.url?scp=85047868687&partnerID=8YFLogxK
U2 - 10.1117/12.2293954
DO - 10.1117/12.2293954
M3 - Conference proceeding
AN - SCOPUS:85047868687
T3 - Progress in Biomedical Optics and Imaging - Proceedings of SPIE
BT - Medical Imaging 2018
A2 - Samuelson, Frank W.
A2 - Nishikawa, Robert M.
PB - SPIE
T2 - Medical Imaging 2018: Image Perception, Observer Performance, and Technology Assessment
Y2 - 11 February 2018 through 12 February 2018
ER -