Abstract
Categorical and numerical attributes occur frequently in cluster analysis tasks. To bridge the information gap between the heterogeneous categorical and numerical attributes in cluster analysis, the existing approaches usually adopt prior assumptions to distance definition and cluster distribution, which unavoidably introduce bias to the clustering process. To address this issue, we propose to analyze mixed data comprising both categorical and numerical attributes by forming minimal clusters through neighborhood set theory. As the minimal clusters are the smallest cluster units that can be obtained without relying on prior assumptions, unbiased cluster analysis can be facilitated accordingly. To avoid information loss, distance and density metrics that are unified on both numerical and categorical attributes are also proposed and utilized to merge the minimal clusters hierarchically. It turns out that our proposed approach is highly interpretable, and is capable of accurately and robustly clustering data sets composed of any combination of numerical and categorical attributes. Extensive experimental evaluations demonstrate its efficacy.
Original language | English |
---|---|
Title of host publication | Pattern Recognition |
Subtitle of host publication | 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part II |
Editors | Apostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal |
Publisher | Springer Cham |
Pages | 254-269 |
Number of pages | 16 |
ISBN (Electronic) | 9783031781667 |
ISBN (Print) | 9783031781650 |
DOIs | |
Publication status | Published - 2 Dec 2024 |
Event | 27th International Conference on Pattern Recognition - Kolkata, India Duration: 1 Dec 2024 → 5 Dec 2024 https://link.springer.com/book/10.1007/978-3-031-78107-0 (Conference proceedings) https://icpr2024.org/ (Conference website) |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 15302 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Name | ICPR: International Conference on Pattern Recognition |
---|
Conference
Conference | 27th International Conference on Pattern Recognition |
---|---|
Abbreviated title | ICPR 2024 |
Country/Territory | India |
City | Kolkata |
Period | 1/12/24 → 5/12/24 |
Internet address |
|
Scopus Subject Areas
- Theoretical Computer Science
- General Computer Science
User-Defined Keywords
- Categorical attribute
- Cluster analysis
- Mixed data
- Neighborhood rough set
- Unsupervised learning