Development of an explainable machine learning model for predicting poststroke anxiety: A multicenter study using Shapley Additive Explanations and nomogram visualization

  • Mengke Lyu
  • , Yanming Xie
  • , Min Li
  • , Christian Hölscher
  • , Xiaoming Shen*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Objective: Neuropsychiatric complications following a stroke can impede recovery and reduce the quality of life. Current predictive methods for poststroke anxiety (PSA) are limited by inadequate feature selection and lack of interpretability. This study aimed to develop an interpretable machine learning model utilizing a wide range of clinical data to detect high-risk PSA patients early, enabling personalized interventions. Methods: This retrospective multicenter study included 238 stroke patients from 10 Chinese hospitals spanning from 1 January 2022 to 11 June 2025. Data encompassing demographic, clinical, biochemical, and psychosocial factors were gathered. Feature selection involved univariate analysis followed by least absolute shrinkage and selection operator (LASSO) regression. Seven machine learning models—logistic regression, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), random forest, decision tree, K-nearest neighbors, and stacking—were constructed and assessed using cross-validation. Feature importance was determined using SHAP (Shapley Additive Explanations), and a nomogram was developed based on the final model. Results: Among the 238 patients, 109 were diagnosed with PSA. In the test set, the logistic regression model exhibited the best performance, achieving an area under the curve (AUC) of 0.981, accuracy of 0.917, sensitivity of 0.867, specificity of 0.952, and an F1 score of 0.897. SHAP analysis identified recurrent stroke, income level, payment type, occupational stress, overwork, sleep quality, continuous drinking history, history of hypertension, diabetes, hyperlipidemia, hyperhomocysteinemia, white blood cell (WBC) count, total cholesterol (TC), low-density lipoprotein (LDL), fibrinogen (FIB), activated partial thromboplastin time (APTT), National Institutes of Health Stroke Scale (NIHSS) score, and Barthel index as crucial predictors. A nomogram incorporating the top 10 SHAP-ranked features was devised to assist in clinical decision-making. Conclusion: The machine learning model demonstrated high accuracy and interpretability in predicting PSA risk. Through the integration of SHAP analysis and nomogram visualization, it offers a practical tool for clinicians to recognize high-risk PSA patients and customize management strategies to improve poststroke outcomes.

Original languageEnglish
Number of pages21
JournalDigital Health
Volume12
DOIs
Publication statusPublished - 8 Jan 2026

User-Defined Keywords

  • explainable AI
  • machine learning
  • nomogram
  • Poststroke anxiety
  • risk prediction
  • SHAP
  • stroke rehabilitation

Fingerprint

Dive into the research topics of 'Development of an explainable machine learning model for predicting poststroke anxiety: A multicenter study using Shapley Additive Explanations and nomogram visualization'. Together they form a unique fingerprint.

Cite this