TY - JOUR
T1 - Self-Adaptive Multiprototype-Based Competitive Learning Approach
T2 - A k-Means-Type Algorithm for Imbalanced Data Clustering
AU - Lu, Yang
AU - Cheung, Yiu Ming
AU - Tang, Yuan Yan
N1 - Funding Information:
Manuscript received January 21, 2019; revised March 28, 2019; accepted May 7, 2019. Date of publication May 29, 2019; date of current version February 17, 2021. This work was supported in part by the National Natural Science Foundation of China under Grant 61672444 and Grant 61272366, in part by the Faculty Research Grant of Hong Kong Baptist University (HKBU) under Project FRG2/17-18/082, in part by KTO Grant of HKBU under Project MPCF-004-2017/18, and in part by SZSTI under Grant JCYJ20160531194006833. This paper was recommended by Associate Editor N. Zhang. (Corresponding author: Yiu-Ming Cheung.) Y. Lu and Y.-M. Cheung are with the Department of Computer Science, Hong Kong Baptist University, Hong Kong (e-mail: [email protected]; [email protected]).
PY - 2021/3
Y1 - 2021/3
N2 - Class imbalance problem has been extensively studied in the recent years, but imbalanced data clustering in unsupervised environment, that is, the number of samples among clusters is imbalanced, has yet to be well studied. This paper, therefore, studies the imbalanced data clustering problem within the framework of k -means-type competitive learning. We introduce a new method called self-adaptive multiprototype-based competitive learning (SMCL) for imbalanced clusters. It uses multiple subclusters to represent each cluster with an automatic adjustment of the number of subclusters. Then, the subclusters are merged into the final clusters based on a novel separation measure. We also propose a new internal clustering validation measure to determine the number of final clusters during the merging process for imbalanced clusters. The advantages of SMCL are threefold: 1) it inherits the advantages of competitive learning and meanwhile is applicable to the imbalanced data clustering; 2) the self-adaptive multiprototype mechanism uses a proper number of subclusters to represent each cluster with any arbitrary shape; and 3) it automatically determines the number of clusters for imbalanced clusters. SMCL is compared with the existing counterparts for imbalanced clustering on the synthetic and real datasets. The experimental results show the efficacy of SMCL for imbalanced clusters.
AB - Class imbalance problem has been extensively studied in the recent years, but imbalanced data clustering in unsupervised environment, that is, the number of samples among clusters is imbalanced, has yet to be well studied. This paper, therefore, studies the imbalanced data clustering problem within the framework of k -means-type competitive learning. We introduce a new method called self-adaptive multiprototype-based competitive learning (SMCL) for imbalanced clusters. It uses multiple subclusters to represent each cluster with an automatic adjustment of the number of subclusters. Then, the subclusters are merged into the final clusters based on a novel separation measure. We also propose a new internal clustering validation measure to determine the number of final clusters during the merging process for imbalanced clusters. The advantages of SMCL are threefold: 1) it inherits the advantages of competitive learning and meanwhile is applicable to the imbalanced data clustering; 2) the self-adaptive multiprototype mechanism uses a proper number of subclusters to represent each cluster with any arbitrary shape; and 3) it automatically determines the number of clusters for imbalanced clusters. SMCL is compared with the existing counterparts for imbalanced clustering on the synthetic and real datasets. The experimental results show the efficacy of SMCL for imbalanced clusters.
KW - Class imbalance learning
KW - competitive learning
KW - data clustering
KW - internal validation measure
KW - k-means-type algorithm
KW - multiprototype clustering
UR - http://www.scopus.com/inward/record.url?scp=85101058814&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2019.2916196
DO - 10.1109/TCYB.2019.2916196
M3 - Journal article
C2 - 31150353
AN - SCOPUS:85101058814
SN - 2168-2267
VL - 51
SP - 1598
EP - 1612
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 3
ER -