A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering

Yiqun Zhang, Yiu Ming CHEUNG*, Kay Chen Tan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Ordinal data are common in many data mining and machine learning tasks. Compared to nominal data, the possible values (also called categories interchangeably) of an ordinal attribute are naturally ordered. Nevertheless, since the data values are not quantitative, the distance between two categories of an ordinal attribute is generally not well defined, which surely has a serious impact on the result of the quantitative analysis if an inappropriate distance metric is utilized. From the practical perspective, ordinal-and-nominal-attribute categorical data, i.e., categorical data associated with a mixture of nominal and ordinal attributes, is common, but the distance metric for such data has yet to be well explored in the literature. In this paper, within the framework of clustering analysis, we therefore first propose an entropy-based distance metric for ordinal attributes, which exploits the underlying order information among categories of an ordinal attribute for the distance measurement. Then, we generalize this distance metric and propose a unified one accordingly, which is applicable to ordinal-and-nominal-attribute categorical data. Compared with the existing metrics proposed for categorical data, the proposed metric is simple to use and nonparametric. More importantly, it reasonably exploits the underlying order information of ordinal attributes and statistical information of nominal attributes for distance measurement. Extensive experiments show that the proposed metric outperforms the existing counterparts on both the real and benchmark data sets.

Original languageEnglish
Article number8671525
Pages (from-to)39-52
Number of pages14
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume31
Issue number1
DOIs
Publication statusPublished - Jan 2020

Scopus Subject Areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

User-Defined Keywords

  • Categorical data
  • clustering algorithms
  • data analysis
  • distance metric
  • entropy
  • order information
  • ordinal attribute

Fingerprint

Dive into the research topics of 'A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering'. Together they form a unique fingerprint.

Cite this