A New Distance Metric for Unsupervised Learning of Categorical Data

Hong Jia, Yiu Ming CHEUNG*, Jiming LIU

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

84 Citations (Scopus)


Distance metric is the basis of many learning algorithms, and its effectiveness usually has a significant influence on the learning results. In general, measuring distance for numerical data is a tractable task, but it could be a nontrivial problem for categorical data sets. This paper, therefore, presents a new distance metric for categorical data based on the characteristics of categorical values. In particular, the distance between two values from one attribute measured by this metric is determined by both the frequency probabilities of these two values and the values of other attributes that have high interdependence with the calculated one. Dynamic attribute weight is further designed to adjust the contribution of each attribute-distance to the distance between the whole data objects. Promising experimental results on different real data sets have shown the effectiveness of the proposed distance metric.

Original languageEnglish
Article number7120127
Pages (from-to)1065-1079
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
Issue number5
Publication statusPublished - May 2016

Scopus Subject Areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

User-Defined Keywords

  • Attribute interdependence
  • categorical attribute
  • clustering analysis
  • distance metric
  • Unsupervised learning


Dive into the research topics of 'A New Distance Metric for Unsupervised Learning of Categorical Data'. Together they form a unique fingerprint.

Cite this