The class-imbalanced classification is a difficult problem because not only traditional classifiers are more biased towards the majority classes and inclined to generate incorrect predictions, but also the existing algorithms often have difficulty tackling this kind of problem with the class overlapping. Oversampling is a widely used and effective method to obtain balanced samples for imbalanced data, but the existing oversampling methods usually result in more serious class overlapping due to improper choice of the reference samples. To circumvent this shortcoming, according to the different possibilities of minority class samples appearing in the overlapping regions in the feature space, a grouping scheme for the minority class samples is first designed to identify the overlapping region samples. Then, a new oversampling method based on this grouping scheme is proposed to make the new samples far away from the overlapping region and rectify the decision boundary properly. Subsequently, a new effective classification algorithm is developed for imbalanced data. Extensive experiments show that the proposed algorithm is superior to the seventeen benchmark algorithms in terms of three performance metrics, especially on high imbalance ratio data sets.
Scopus Subject Areas
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence
- Imbalanced data classification
- Kernel method
- Support vector machine