Grouping-based Oversampling in Kernel Space for Imbalanced Data Classification

Jinjun Ren, Yuping Wang*, Yiu-ming Cheung, Xiao-Zhi Gao, Xiaofang Guo

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

23 Citations (Scopus)

Abstract

The class-imbalanced classification is a difficult problem because not only traditional classifiers are more biased towards the majority classes and inclined to generate incorrect predictions, but also the existing algorithms often have difficulty tackling this kind of problem with the class overlapping. Oversampling is a widely used and effective method to obtain balanced samples for imbalanced data, but the existing oversampling methods usually result in more serious class overlapping due to improper choice of the reference samples. To circumvent this shortcoming, according to the different possibilities of minority class samples appearing in the overlapping regions in the feature space, a grouping scheme for the minority class samples is first designed to identify the overlapping region samples. Then, a new oversampling method based on this grouping scheme is proposed to make the new samples far away from the overlapping region and rectify the decision boundary properly. Subsequently, a new effective classification algorithm is developed for imbalanced data. Extensive experiments show that the proposed algorithm is superior to the seventeen benchmark algorithms in terms of three performance metrics, especially on high imbalance ratio data sets.

Original languageEnglish
Article number108992
Number of pages14
JournalPattern Recognition
Volume133
Early online date24 Aug 2022
DOIs
Publication statusPublished - Jan 2023

Scopus Subject Areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

User-Defined Keywords

  • Imbalanced data classification
  • Kernel method
  • Support vector machine
  • Oversampling

Fingerprint

Dive into the research topics of 'Grouping-based Oversampling in Kernel Space for Imbalanced Data Classification'. Together they form a unique fingerprint.

Cite this