Learnable Weighting of Intra-attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes

Yiqun Zhang, Yiu Ming CHEUNG

Research output: Contribution to journalArticlepeer-review

Abstract

The success of categorical data clustering generally much relies on the distance metric that measures the dissimilarity degree between two objects. However, most of the existing clustering methods treat the two categorical subtypes, i.e. nominal and ordinal attributes, in the same way when calculating the dissimilarity without considering the relative order information of the ordinal values. Moreover, there would exist interdependence among the nominal and ordinal attributes, which is worth exploring for indicating the dissimilarity. This paper will therefore study the intrinsic difference and connection of nominal and ordinal attribute values from a perspective akin to the graph. Accordingly, we propose a novel distance metric to measure the intra-attribute distances of nominal and ordinal attributes in a unified way, meanwhile preserving the order relationship among ordinal values. Subsequently, we propose a new clustering algorithm to make the learning of intra-attribute distance weights and partitions of data objects into a single learning paradigm rather than two separate steps, whereby circumventing a suboptimal solution. Experiments show the efficacy of the proposed algorithm in comparison with the existing counterparts.

Original languageEnglish
Number of pages17
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
DOIs
Publication statusE-pub ahead of print - 3 Feb 2021

Scopus Subject Areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

User-Defined Keywords

  • Categorical data clustering
  • intra-attribute distance
  • learnable weighting
  • nominal-and-ordinal attribute

Fingerprint

Dive into the research topics of 'Learnable Weighting of Intra-attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes'. Together they form a unique fingerprint.

Cite this