TY - JOUR
T1 - Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift
AU - Lu, Yang
AU - Cheung, Yiu Ming
AU - Tang, Yuan Yan
N1 - Funding Information:
Manuscript received April 30, 2018; revised January 29, 2019 and September 12, 2019; accepted November 2, 2019. Date of publication December 5, 2019; date of current version August 4, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant 61672444 and Grant 61272366, in part by the Hong Kong Baptist University (HKBU), Research Committee, Initiation Grant—Faculty Niche Research Areas (IG-FNRA) 2018/2019 under Grant RC-FNRA-IG/18-19/SCI/03, in part by the Innovation and Technology Fund of the Innovation and Technology Commission of the Government of the Hong Kong SAR under Project ITS/339/18, in part by the Faculty Research Grant of HKBU under Project FRG2/17-18/082, and in part by the Shenzhen Science, Technology and Innovation Committee (SZSTI) under Grant JCYJ20160531194006833. (Corresponding author: Yiu-Ming Cheung.) Y. Lu is with the Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, Xiamen 361005, China, and also with the Department of Computer Science, Hong Kong Baptist University, Hong Kong (e-mail: [email protected]).
PY - 2020/8
Y1 - 2020/8
N2 - One of the most challenging problems in the field of online learning is concept drift, which deeply influences the classification stability of streaming data. If the data stream is imbalanced, it is even more difficult to detect concept drifts and make an online learner adapt to them. Ensemble algorithms have been found effective for the classification of streaming data with concept drift, whereby an individual classifier is built for each incoming data chunk and its associated weight is adjusted to manage the drift. However, it is difficult to adjust the weights to achieve a balance between the stability and adaptability of the ensemble classifiers. In addition, when the data stream is imbalanced, the use of a size-fixed chunk to build a single classifier can create further problems; the data chunk may contain too few or even no minority class samples (i.e., only majority class samples). A classifier built on such a chunk is unstable in the ensemble. In this article, we propose a chunk-based incremental learning method called adaptive chunk-based dynamic weighted majority (ACDWM) to deal with imbalanced streaming data containing concept drift. ACDWM utilizes an ensemble framework by dynamically weighting the individual classifiers according to their classification performance on the current data chunk. The chunk size is adaptively selected by statistical hypothesis tests to access whether the classifier built on the current data chunk is sufficiently stable. ACDWM has four advantages compared with the existing methods as follows: 1) it can maintain stability when processing nondrifted streams and rapidly adapt to the new concept; 2) it is entirely incremental, i.e., no previous data need to be stored; 3) it stores a limited number of classifiers to ensure high efficiency; and 4) it adaptively selects the chunk size in the concept drift environment. Experiments on both synthetic and real data sets containing concept drift show that ACDWM outperforms both state-of-the-art chunk-based and online methods.
AB - One of the most challenging problems in the field of online learning is concept drift, which deeply influences the classification stability of streaming data. If the data stream is imbalanced, it is even more difficult to detect concept drifts and make an online learner adapt to them. Ensemble algorithms have been found effective for the classification of streaming data with concept drift, whereby an individual classifier is built for each incoming data chunk and its associated weight is adjusted to manage the drift. However, it is difficult to adjust the weights to achieve a balance between the stability and adaptability of the ensemble classifiers. In addition, when the data stream is imbalanced, the use of a size-fixed chunk to build a single classifier can create further problems; the data chunk may contain too few or even no minority class samples (i.e., only majority class samples). A classifier built on such a chunk is unstable in the ensemble. In this article, we propose a chunk-based incremental learning method called adaptive chunk-based dynamic weighted majority (ACDWM) to deal with imbalanced streaming data containing concept drift. ACDWM utilizes an ensemble framework by dynamically weighting the individual classifiers according to their classification performance on the current data chunk. The chunk size is adaptively selected by statistical hypothesis tests to access whether the classifier built on the current data chunk is sufficiently stable. ACDWM has four advantages compared with the existing methods as follows: 1) it can maintain stability when processing nondrifted streams and rapidly adapt to the new concept; 2) it is entirely incremental, i.e., no previous data need to be stored; 3) it stores a limited number of classifiers to ensure high efficiency; and 4) it adaptively selects the chunk size in the concept drift environment. Experiments on both synthetic and real data sets containing concept drift show that ACDWM outperforms both state-of-the-art chunk-based and online methods.
KW - Concept drift
KW - ensemble methods
KW - imbalance learning
KW - online learning
UR - http://www.scopus.com/inward/record.url?scp=85089128769&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2019.2951814
DO - 10.1109/TNNLS.2019.2951814
M3 - Journal article
C2 - 31825880
AN - SCOPUS:85089128769
SN - 2162-237X
VL - 31
SP - 2764
EP - 2778
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 8
ER -