Towards Unbiased Minimal Cluster Analysis of Categorical-and-Numerical Attribute Data

Yunfan Zhang, Xiaopeng Luo, Qingsheng Chen, Rong Zou, Yiqun Zhang*, Yiu Ming Cheung

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Categorical and numerical attributes occur frequently in cluster analysis tasks. To bridge the information gap between the heterogeneous categorical and numerical attributes in cluster analysis, the existing approaches usually adopt prior assumptions to distance definition and cluster distribution, which unavoidably introduce bias to the clustering process. To address this issue, we propose to analyze mixed data comprising both categorical and numerical attributes by forming minimal clusters through neighborhood set theory. As the minimal clusters are the smallest cluster units that can be obtained without relying on prior assumptions, unbiased cluster analysis can be facilitated accordingly. To avoid information loss, distance and density metrics that are unified on both numerical and categorical attributes are also proposed and utilized to merge the minimal clusters hierarchically. It turns out that our proposed approach is highly interpretable, and is capable of accurately and robustly clustering data sets composed of any combination of numerical and categorical attributes. Extensive experimental evaluations demonstrate its efficacy.

Original languageEnglish
Title of host publicationPattern Recognition
Subtitle of host publication27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part II
EditorsApostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal
PublisherSpringer Cham
Pages254-269
Number of pages16
ISBN (Electronic)9783031781667
ISBN (Print)9783031781650
DOIs
Publication statusPublished - 2 Dec 2024
Event27th International Conference on Pattern Recognition - Kolkata, India
Duration: 1 Dec 20245 Dec 2024
https://link.springer.com/book/10.1007/978-3-031-78107-0 (Conference proceedings)
https://icpr2024.org/ (Conference website)

Publication series

NameLecture Notes in Computer Science
Volume15302
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameICPR: International Conference on Pattern Recognition

Conference

Conference27th International Conference on Pattern Recognition
Abbreviated title ICPR 2024
Country/TerritoryIndia
CityKolkata
Period1/12/245/12/24
Internet address

Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

User-Defined Keywords

  • Categorical attribute
  • Cluster analysis
  • Mixed data
  • Neighborhood rough set
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'Towards Unbiased Minimal Cluster Analysis of Categorical-and-Numerical Attribute Data'. Together they form a unique fingerprint.

Cite this