Abstract
Ordinal attribute has all the common characteristics of a nominal one but it differs from the nominal one by having naturally ordered possible values (also called categories interchangeably). In clustering analysis tasks, categorical data composed of both ordinal and nominal attributes (also called mixed-categorical data interchangeably) are common. Under this circumstance, existing distance and similarity measures suffer from at least one of the following two drawbacks: 1) directly treat ordinal attributes as nominal ones, and thus ignore the order information from them and 2) suppose all the attributes are independent of each other, measure the distance between two categories from a target attribute without considering the valuable information provided by the other attributes that correlate with the target one. These two drawbacks may twist the natural distances of attributes and further lead to unsatisfactory clustering results. This article, therefore, presents an entropy-based distance metric that quantifies the distance between categories by exploiting the information provided by different attributes that correlate with the target one. It also preserves the order relationship among ordinal categories during the distance measurement. Since attributes are usually correlated in different degrees, we also define the interdependence between different types of attributes to weight their contributions in forming distances. The proposed metric overcomes the two above-mentioned drawbacks for mixed-categorical data clustering. More important, it conceptually unifies the distances of ordinal and nominal attributes to avoid information loss during clustering. Moreover, it is parameter free, and will not bring extra computational cost compared to the existing state-of-the-art counterparts. Extensive experiments show the superiority of the proposed distance metric.
Original language | English |
---|---|
Pages (from-to) | 758-771 |
Number of pages | 14 |
Journal | IEEE Transactions on Cybernetics |
Volume | 52 |
Issue number | 2 |
DOIs | |
Publication status | Published - Feb 2022 |
Scopus Subject Areas
- Software
- Information Systems
- Human-Computer Interaction
- Electrical and Electronic Engineering
- Control and Systems Engineering
- Computer Science Applications
User-Defined Keywords
- Clustering analysis
- interdependence
- ordinal-and-nominal-attribute data
- unified distance metric (UDM)