TY - JOUR
T1 - Adaptive micro partition and hierarchical merging for accurate mixed data clustering
AU - Zhang, Yunfan
AU - Zou, Rong
AU - Zhang, Yiqun
AU - Zhang, Yue
AU - Cheung, Yiu Ming
AU - Li, Kangshun
N1 - This work was supported in part by the National Natural Science Foundation of China (NSFC) under grants: 62476063 and 62102097, the NSFC/Research Grants Council (RGC) Joint Research Scheme under the grant N_HKBU214/21, the Natural Science Foundation of Guangdong Province under Grant 2023A1515012855, the General Research Fund of RGC under Grants: 12201321, 12202622, and 12201323, and the RGC Senior Research Fellow Scheme under grant SRFS2324-2S02.
Publisher Copyright:
© The Author(s) 2024.
PY - 2025/1
Y1 - 2025/1
N2 - Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority.
AB - Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority.
KW - Cluster analysis
KW - Heterogeneous attributes
KW - Neighborhood rough set
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85212759058&partnerID=8YFLogxK
U2 - 10.1007/s40747-024-01695-7
DO - 10.1007/s40747-024-01695-7
M3 - Journal article
AN - SCOPUS:85212759058
SN - 2199-4536
VL - 11
JO - Complex and Intelligent Systems
JF - Complex and Intelligent Systems
IS - 1
M1 - 84
ER -