Self-Adaptive Multiprototype-Based Competitive Learning Approach: A k-Means-Type Algorithm for Imbalanced Data Clustering

Yang Lu, Yiu Ming Cheung*, Yuan Yan Tang

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

51 Citations (Scopus)

Abstract

Class imbalance problem has been extensively studied in the recent years, but imbalanced data clustering in unsupervised environment, that is, the number of samples among clusters is imbalanced, has yet to be well studied. This paper, therefore, studies the imbalanced data clustering problem within the framework of k -means-type competitive learning. We introduce a new method called self-adaptive multiprototype-based competitive learning (SMCL) for imbalanced clusters. It uses multiple subclusters to represent each cluster with an automatic adjustment of the number of subclusters. Then, the subclusters are merged into the final clusters based on a novel separation measure. We also propose a new internal clustering validation measure to determine the number of final clusters during the merging process for imbalanced clusters. The advantages of SMCL are threefold: 1) it inherits the advantages of competitive learning and meanwhile is applicable to the imbalanced data clustering; 2) the self-adaptive multiprototype mechanism uses a proper number of subclusters to represent each cluster with any arbitrary shape; and 3) it automatically determines the number of clusters for imbalanced clusters. SMCL is compared with the existing counterparts for imbalanced clustering on the synthetic and real datasets. The experimental results show the efficacy of SMCL for imbalanced clusters.

Original languageEnglish
Pages (from-to)1598-1612
Number of pages15
JournalIEEE Transactions on Cybernetics
Volume51
Issue number3
Early online date29 May 2019
DOIs
Publication statusPublished - Mar 2021

Scopus Subject Areas

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

User-Defined Keywords

  • Class imbalance learning
  • competitive learning
  • data clustering
  • internal validation measure
  • k-means-type algorithm
  • multiprototype clustering

Fingerprint

Dive into the research topics of 'Self-Adaptive Multiprototype-Based Competitive Learning Approach: A k-Means-Type Algorithm for Imbalanced Data Clustering'. Together they form a unique fingerprint.

Cite this