The class-imbalance problem has been widely distributed in various research fields. The larger the data scale and the higher the data imbalance, the more difficult the proper classification. For large-scale highly imbalanced data sets, the ensemble method based on under-sampling is one of the most competitive techniques among the existing techniques. However, it is susceptible to improperly sampling strategies, easy to lose the useful information of the majority class, and not easy to generalize the learning model. To overcome these limitations, we propose an equalization ensemble method (EASE) with two new schemes. First, we propose an equalization under-sampling scheme to generate a balanced data set for each base classifier, which can reduce the impact of class imbalance on the base classifiers; Second, we design a weighted integration scheme, where the G-mean scores obtained by base classifiers on the original imbalanced data set are used as the weights. These weights can not only make the better-performed base-classifiers dominate the final classification decision, but also adapt to a variety of imbalanced data sets with different scales while avoiding the occurrence of some extremely bad situations. Experimental results on three metrics show that EASE increases the diversity of base classifiers and outperforms twelve state-of-the-art methods on the imbalanced data sets with different scales.
Scopus Subject Areas
- Management Information Systems
- Information Systems and Management
- Artificial Intelligence
- Ensemble learning
- Imbalanced data classification
- Large-scale data