A New Distance Metric Exploiting Heterogeneous Interattribute Relationship for Ordinal-and-Nominal-Attribute Data Clustering

Yiqun Zhang, Yiu Ming Cheung*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Ordinal attribute has all the common characteristics of a nominal one but it differs from the nominal one by having naturally ordered possible values (also called categories interchangeably). In clustering analysis tasks, categorical data composed of both ordinal and nominal attributes (also called mixed-categorical data interchangeably) are common. Under this circumstance, existing distance and similarity measures suffer from at least one of the following two drawbacks: 1) directly treat ordinal attributes as nominal ones, and thus ignore the order information from them and 2) suppose all the attributes are independent of each other, measure the distance between two categories from a target attribute without considering the valuable information provided by the other attributes that correlate with the target one. These two drawbacks may twist the natural distances of attributes and further lead to unsatisfactory clustering results. This article, therefore, presents an entropy-based distance metric that quantifies the distance between categories by exploiting the information provided by different attributes that correlate with the target one. It also preserves the order relationship among ordinal categories during the distance measurement. Since attributes are usually correlated in different degrees, we also define the interdependence between different types of attributes to weight their contributions in forming distances. The proposed metric overcomes the two above-mentioned drawbacks for mixed-categorical data clustering. More important, it conceptually unifies the distances of ordinal and nominal attributes to avoid information loss during clustering. Moreover, it is parameter free, and will not bring extra computational cost compared to the existing state-of-the-art counterparts. Extensive experiments show the superiority of the proposed distance metric.

Original languageEnglish
Pages (from-to)758-771
Number of pages14
JournalIEEE Transactions on Cybernetics
Volume52
Issue number2
DOIs
Publication statusPublished - 1 Feb 2022

Scopus Subject Areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Electrical and Electronic Engineering
  • Control and Systems Engineering
  • Computer Science Applications

User-Defined Keywords

  • Clustering analysis
  • interdependence
  • ordinal-and-nominal-attribute data
  • unified distance metric (UDM)
  • Cats
  • Distance measurement
  • Frequency measurement
  • Data mining
  • Task analysis
  • Information theory

Fingerprint

Dive into the research topics of 'A New Distance Metric Exploiting Heterogeneous Interattribute Relationship for Ordinal-and-Nominal-Attribute Data Clustering'. Together they form a unique fingerprint.

Cite this