Skip to main navigation Skip to search Skip to main content

Machine learning algorithms predict breast cancer incidence risk: a data-driven retrospective study based on biochemical biomarkers

  • Qianqian Guo (Co-first author)
  • , Peng Wu (Co-first author)
  • , Junhao He
  • , Ge Zhang
  • , Wu Zhou*
  • , Qianjun Chen*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

5 Citations (Scopus)

Abstract

Background: Current breast cancer prediction models typically rely on personal information and medical history, with limited inclusion of blood-based biomarkers. This study aimed to identify novel breast cancer risk factors using machine learning algorithms. By integrating both personal clinical factors and peripheral blood biochemical biomarkers, it sought to enhance the understanding of breast cancer risk.

Methods: Data were screened and normalized according to predefined inclusion and exclusion criteria. Logistic regression with forward selection and six other machine learning algorithms were employed to identify variables associated with breast cancer incidence. The performance of the models was evaluated using the area under the curve (AUC) through 5-fold cross-validation.

Results: The data were divided into a training cohort of 17,360 cases and a testing cohort of 8,551 cases. Logistic regression analysis revealed that breast cancer incidence was increased with age (odds ratio [OR]:1.136, 95% confidence interval [CI]: [1.130, 1.142], P < 0.001), gamma-glutamyl transferase (GGT) (OR: 1.002, 95% CI: [1.000, 1.004], P = 0.014), and alanine transaminase (ALT) (OR: 1.005, 95% CI: [1.001, 1.008], P = 0.008). Furthermore, the six machine learning algorithms consistently identified GGT and ALT as the most significant predictive features. The AUC values obtained from the six models after 5-fold cross-validation ranged from 0.779 to 0.862, with accuracy ranging from 0.780 to 0.841.

Conclusions: Our study identified two biochemical biomarkers (GGT and ALT) as promising indicators for breast cancer prediction. Incorporating these findings into a tailored breast cancer risk prediction model is needed in our future research.
Original languageEnglish
Article number1061
Number of pages11
JournalBMC Cancer
Volume25
Issue number1
DOIs
Publication statusPublished - 1 Jul 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

User-Defined Keywords

  • Breast cancer
  • Biochemical biomarkers
  • Risk factor
  • Machine learning
  • Validation

Fingerprint

Dive into the research topics of 'Machine learning algorithms predict breast cancer incidence risk: a data-driven retrospective study based on biochemical biomarkers'. Together they form a unique fingerprint.

Cite this