Subspace clustering of categorical and numerical data with an unknown number of clusters

Hong Jia, Yiu Ming Cheung*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

71 Citations (Scopus)

Abstract

In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire data space. However, the most existing subspace clustering methods are only applicable to either numerical or categorical data, but not both. This paper, therefore, studies the soft subspace clustering of data with both of the numerical and categorical attributes (also simply called mixed data for short). Specifically, an attribute-weighted clustering model based on the definition of object-cluster similarity is presented. Accordingly, a unified weighting scheme for the numerical and categorical attributes is proposed, which quantifies the attribute-to-cluster contribution by taking into account both of intercluster difference and intracluster similarity. Moreover, a rival penalized competitive learning mechanism is further introduced into the proposed soft subspace clustering algorithm so that the subspace cluster structure as well as the most appropriate number of clusters can be learned simultaneously in a single learning paradigm. In addition, an initialization-oriented method is also presented, which can effectively improve the stability and accuracy of k -means-type clustering methods on numerical, categorical, and mixed data. The experimental results on different benchmark data sets show the efficacy of the proposed approach.

Original languageEnglish
Pages (from-to)3308-3325
Number of pages18
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume29
Issue number8
DOIs
Publication statusPublished - Aug 2018

Scopus Subject Areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

User-Defined Keywords

  • Attribute weight
  • categorical-and-numerical data
  • initialization method
  • number of clusters
  • soft subspace clustering

Fingerprint

Dive into the research topics of 'Subspace clustering of categorical and numerical data with an unknown number of clusters'. Together they form a unique fingerprint.

Cite this