TY - GEN
T1 - Hybrid sampling with bagging for class imbalance learning
AU - Lu, Yang
AU - CHEUNG, Yiu Ming
AU - Tang, Yuan Yan
N1 - Funding Information:
This work was supported by the Faculty Research Grants of Hong Kong Baptist University (HKBU): FRG2/14-15/075, and by the National Natural Science Foundation of China under Grant Number: 61272366
PY - 2016
Y1 - 2016
N2 - For class imbalance problem, the integration of sampling and ensemble methods has shown great success among various methods. Nevertheless, as the representatives of sampling methods, undersampling and oversampling cannot outperform each other. That is, undersampling fits some data sets while oversampling fits some other. Besides, the sampling rate also significantly influences the performance of a classifier, while existing methods usually adopt full sampling rate to produce balanced training set. In this paper, we propose a new algorithm that utilizes a new hybrid scheme of undersampling and oversampling with sampling rate selection to preprocess the data in each ensemble iteration. Bagging is adopted as the ensemble framework because the sampling rate selection can benefit from the Out-Of-Bag estimate in bagging. The proposed method features both of undersampling and oversampling, and the specifically selected sampling rate for each data set. The experiments are conducted on 26 data sets from the UCI data repository, in which the proposed method in comparison with the existing counterparts is evaluated by three evaluation metrics. Experiments show that, combined with bagging, the proposed hybrid sampling method significantly outperforms the other state-of-the-art bagging-based methods for class imbalance problem. Meanwhile, the superiority of sampling rate selection is also demonstrated.
AB - For class imbalance problem, the integration of sampling and ensemble methods has shown great success among various methods. Nevertheless, as the representatives of sampling methods, undersampling and oversampling cannot outperform each other. That is, undersampling fits some data sets while oversampling fits some other. Besides, the sampling rate also significantly influences the performance of a classifier, while existing methods usually adopt full sampling rate to produce balanced training set. In this paper, we propose a new algorithm that utilizes a new hybrid scheme of undersampling and oversampling with sampling rate selection to preprocess the data in each ensemble iteration. Bagging is adopted as the ensemble framework because the sampling rate selection can benefit from the Out-Of-Bag estimate in bagging. The proposed method features both of undersampling and oversampling, and the specifically selected sampling rate for each data set. The experiments are conducted on 26 data sets from the UCI data repository, in which the proposed method in comparison with the existing counterparts is evaluated by three evaluation metrics. Experiments show that, combined with bagging, the proposed hybrid sampling method significantly outperforms the other state-of-the-art bagging-based methods for class imbalance problem. Meanwhile, the superiority of sampling rate selection is also demonstrated.
KW - Class imbalance learning
KW - Ensemble method
KW - Hybrid sampling
KW - Sampling method
UR - http://www.scopus.com/inward/record.url?scp=84963994580&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-31753-3_2
DO - 10.1007/978-3-319-31753-3_2
M3 - Conference proceeding
AN - SCOPUS:84963994580
SN - 9783319317526
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 14
EP - 26
BT - Advances in Knowledge Discovery and Data Mining - 20th Pacific-Asia Conference, PAKDD 2016, Proceedings
A2 - Washio, Takashi
A2 - Huang, Joshua Zhexue
A2 - Khan, Latifur
A2 - Wang, Ruili
A2 - Bailey, James
A2 - Dobbie, Gillian
PB - Springer Verlag
T2 - 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2016
Y2 - 19 April 2016 through 22 April 2016
ER -