Abstract
Background and objectives: Traditional Chinese medicine (TCM) is a classical personalized medicine paradigm with a long history in managing various conditions (e.g., COVID-19, heart failure (HF)). The large-scale accumulated pool of herbal prescription data from electronic medical records (EMRs) provided an opportunity to develop novel regimens through data mining. However, TCM new drug discovery research is complicated by the complexity of ingredients contained in the TCM prescriptions. A causal inference framework integrating clinical reasoning and the underlying biochemistry of herbal prescriptions is essential for identifying the core effective ingredients from raw herbs. Materials and methods: We proposed a novel causal learning approach, real world evidence-based effective Chemical Screener (rweChemScreener), to identify the effective ingredients in TCM treatments using real-world data from EMRs. Two real-world inpatient registries of COVID-19 and HF treated with TCM were utilized as datasets. Our rweChemScreener utilized high-dimensional mediation analysis to identify and validate herbal ingredients that mediate the therapeutic effects of TCM treatments. These mediating ingredients were derived from a mapping process that integrated prescription records with herb-ingredient knowledge graph. In addition, a proxy therapeutic mediator variable was introduced by mapping the ingredients via eXtreme Gradient Boosting (XGBoost) regressor. The contribution of each ingredient to the mediation effect was estimated using SHapley Additive exPlanations (SHAP), and ingredients with top-ranking importances were considered as the potential effective ingredients of TCM treatments. Results: The multiple experimental assessments conducted on semi-synthetic datasets demonstrated that a more accurately estimated mediating effect by rweChemScreener when compared to the baseline model. We identified the top six and nine potential effective herbal ingredients for COVID-19 (e.g., apicidin, limonin, and tricin) and HF (e.g., rutin, beta-sitosterol, and salicylic acid), respectively. These effect of these ingredients was supported by a subsequent literature. Conclusion: This study suggested rweChemScreener, a novel causal machine learning approach, is capable of screening effective chemical ingredients of TCM herbal treatments through high-dimensional mediation analysis. The key contribution of this study lies in conducting screening for potential effective ingredients with real-world clinical efficacy as the direct orientation, via integrating real-world clinical and basic research data.
| Original language | English |
|---|---|
| Article number | 157225 |
| Number of pages | 13 |
| Journal | Phytomedicine |
| Volume | 147 |
| Early online date | 12 Sept 2025 |
| DOIs | |
| Publication status | Published - Nov 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
User-Defined Keywords
- COVID-19
- Effective herb ingredient screening
- Heart failure
- High-dimension mediation analysis
- Real-world clinical data
- Traditional Chinese medicine
Fingerprint
Dive into the research topics of 'rweChemScreener: high-dimension mediation analysis detects potential effective chemical ingredients of traditional Chinese medicine from real-world clinical data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver