Equalization ensemble for large scale highly imbalanced data classification

Jinjun Ren, Yuping Wang*, Mingqian Mao, Yiu-ming Cheung

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

22 Citations (Scopus)

Abstract

The class-imbalance problem has been widely distributed in various research fields. The larger the data scale and the higher the data imbalance, the more difficult the proper classification. For large-scale highly imbalanced data sets, the ensemble method based on under-sampling is one of the most competitive techniques among the existing techniques. However, it is susceptible to improperly sampling strategies, easy to lose the useful information of the majority class, and not easy to generalize the learning model. To overcome these limitations, we propose an equalization ensemble method (EASE) with two new schemes. First, we propose an equalization under-sampling scheme to generate a balanced data set for each base classifier, which can reduce the impact of class imbalance on the base classifiers; Second, we design a weighted integration scheme, where the G-mean scores obtained by base classifiers on the original imbalanced data set are used as the weights. These weights can not only make the better-performed base-classifiers dominate the final classification decision, but also adapt to a variety of imbalanced data sets with different scales while avoiding the occurrence of some extremely bad situations. Experimental results on three metrics show that EASE increases the diversity of base classifiers and outperforms twelve state-of-the-art methods on the imbalanced data sets with different scales.

Original languageEnglish
Article number108295
JournalKnowledge-Based Systems
Volume242
Early online date31 Jan 2022
DOIs
Publication statusPublished - Apr 2022

Scopus Subject Areas

  • Software
  • Management Information Systems
  • Information Systems and Management
  • Artificial Intelligence

User-Defined Keywords

  • Ensemble learning
  • Imbalanced data classification
  • Large-scale data
  • Under-sampling

Fingerprint

Dive into the research topics of 'Equalization ensemble for large scale highly imbalanced data classification'. Together they form a unique fingerprint.

Cite this