Adaptive micro partition and hierarchical merging for accurate mixed data clustering

Yunfan Zhang, Rong Zou, Yiqun Zhang*, Yue Zhang, Yiu Ming Cheung, Kangshun Li

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Heterogeneous attribute data (also called mixed data), characterized by attributes with numerical and categorical values, occur frequently across various scenarios. Since the annotation cost is high, clustering has emerged as a favorable technique for analyzing unlabeled mixed data. To address the complex real-world clustering task, this paper proposes a new clustering method called Adaptive Micro Partition and Hierarchical Merging (AMPHM) based on neighborhood rough set theory and a novel hierarchical merging mechanism. Specifically, we present a distance metric unified on numerical and categorical attributes to leverage neighborhood rough sets in partitioning data objects into fine-grained compact clusters. Then, we gradually merge the current most similar clusters to avoid incorporating dissimilar objects into a similar cluster. It turns out that the proposed approach breaks through the clustering performance bottleneck brought by the pre-set number of sought clusters k and cluster distribution bias, and is thus capable of clustering datasets comprising various combinations of numerical and categorical attributes. Extensive experimental evaluations comparing the proposed AMPHM with state-of-the-art counterparts on various datasets demonstrate its superiority.

Original languageEnglish
Article number84
Number of pages14
JournalComplex and Intelligent Systems
Volume11
Issue number1
Early online date19 Dec 2024
DOIs
Publication statusPublished - Jan 2025

Scopus Subject Areas

  • Information Systems
  • Engineering (miscellaneous)
  • Computational Mathematics
  • Artificial Intelligence

User-Defined Keywords

  • Cluster analysis
  • Heterogeneous attributes
  • Neighborhood rough set
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'Adaptive micro partition and hierarchical merging for accurate mixed data clustering'. Together they form a unique fingerprint.

Cite this