Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift

Yang Lu, Yiu Ming CHEUNG*, Yuan Yan Tang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

One of the most challenging problems in the field of online learning is concept drift, which deeply influences the classification stability of streaming data. If the data stream is imbalanced, it is even more difficult to detect concept drifts and make an online learner adapt to them. Ensemble algorithms have been found effective for the classification of streaming data with concept drift, whereby an individual classifier is built for each incoming data chunk and its associated weight is adjusted to manage the drift. However, it is difficult to adjust the weights to achieve a balance between the stability and adaptability of the ensemble classifiers. In addition, when the data stream is imbalanced, the use of a size-fixed chunk to build a single classifier can create further problems; the data chunk may contain too few or even no minority class samples (i.e., only majority class samples). A classifier built on such a chunk is unstable in the ensemble. In this article, we propose a chunk-based incremental learning method called adaptive chunk-based dynamic weighted majority (ACDWM) to deal with imbalanced streaming data containing concept drift. ACDWM utilizes an ensemble framework by dynamically weighting the individual classifiers according to their classification performance on the current data chunk. The chunk size is adaptively selected by statistical hypothesis tests to access whether the classifier built on the current data chunk is sufficiently stable. ACDWM has four advantages compared with the existing methods as follows: 1) it can maintain stability when processing nondrifted streams and rapidly adapt to the new concept; 2) it is entirely incremental, i.e., no previous data need to be stored; 3) it stores a limited number of classifiers to ensure high efficiency; and 4) it adaptively selects the chunk size in the concept drift environment. Experiments on both synthetic and real data sets containing concept drift show that ACDWM outperforms both state-of-the-art chunk-based and online methods.

Original languageEnglish
Article number8924892
Pages (from-to)2764-2778
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume31
Issue number8
DOIs
Publication statusPublished - Aug 2020

Scopus Subject Areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

User-Defined Keywords

  • Concept drift
  • ensemble methods
  • imbalance learning
  • online learning

Fingerprint

Dive into the research topics of 'Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift'. Together they form a unique fingerprint.

Cite this