Hybrid sampling with bagging for class imbalance learning

Yang Lu, Yiu Ming CHEUNG*, Yuan Yan Tang

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

25 Citations (Scopus)

Abstract

For class imbalance problem, the integration of sampling and ensemble methods has shown great success among various methods. Nevertheless, as the representatives of sampling methods, undersampling and oversampling cannot outperform each other. That is, undersampling fits some data sets while oversampling fits some other. Besides, the sampling rate also significantly influences the performance of a classifier, while existing methods usually adopt full sampling rate to produce balanced training set. In this paper, we propose a new algorithm that utilizes a new hybrid scheme of undersampling and oversampling with sampling rate selection to preprocess the data in each ensemble iteration. Bagging is adopted as the ensemble framework because the sampling rate selection can benefit from the Out-Of-Bag estimate in bagging. The proposed method features both of undersampling and oversampling, and the specifically selected sampling rate for each data set. The experiments are conducted on 26 data sets from the UCI data repository, in which the proposed method in comparison with the existing counterparts is evaluated by three evaluation metrics. Experiments show that, combined with bagging, the proposed hybrid sampling method significantly outperforms the other state-of-the-art bagging-based methods for class imbalance problem. Meanwhile, the superiority of sampling rate selection is also demonstrated.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 20th Pacific-Asia Conference, PAKDD 2016, Proceedings
EditorsTakashi Washio, Joshua Zhexue Huang, Latifur Khan, Ruili Wang, James Bailey, Gillian Dobbie
PublisherSpringer Verlag
Pages14-26
Number of pages13
ISBN (Print)9783319317526
DOIs
Publication statusPublished - 2016
Event20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2016 - Auckland, New Zealand
Duration: 19 Apr 201622 Apr 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9651
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2016
Country/TerritoryNew Zealand
CityAuckland
Period19/04/1622/04/16

Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

User-Defined Keywords

  • Class imbalance learning
  • Ensemble method
  • Hybrid sampling
  • Sampling method

Fingerprint

Dive into the research topics of 'Hybrid sampling with bagging for class imbalance learning'. Together they form a unique fingerprint.

Cite this