On discovery of extremely low-dimensional clusters using semi-supervised projected clustering

Kevin Y. Yip*, David W. Cheung, Michael K. Ng

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

47 Citations (Scopus)

Abstract

Recent studies suggest that projected clusters with extremely low dimensionality exist in many real datasets. A number of projected clustering algorithms have been proposed in the past several years, but few can identify clusters with dimensionality lower than 10% of the total number of dimensions, which are commonly found in some real datasets such as gene expression profiles. In this paper we propose a new algorithm that can accurately identify projected clusters with relevant dimensions as few as 5% of the total number of dimensions. It makes use of a robust objective function that combines object clustering and dimension selection into a single optimization problem. The algorithm can also utilize domain knowledge in the form of labeled objects and labeled dimensions to improve its clustering accuracy. We believe this is the first semi-supervised projected clustering algorithm. Both theoretical analysis and experimental results show that by using a small amount of input knowledge, possibly covering only a portion of the underlying classes, the new algorithm can be further improved to accurately detect clusters with only 1% of the dimensions being relevant. The algorithm is also useful in getting a target set of clusters when there are multiple possible groupings of the objects.

Original languageEnglish
Title of host publicationProceedings - 21st International Conference on Data Engineering, ICDE 2005
EditorsStephanie Kawada
PublisherIEEE
Pages329-340
Number of pages12
ISBN (Print)0769522858
DOIs
Publication statusPublished - 5 Apr 2005
Event21st International Conference on Data Engineering, ICDE 2005 - Tokyo, Japan
Duration: 5 Apr 20058 Apr 2005
https://ieeexplore.ieee.org/xpl/conhome/9680/proceeding

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1063-6382
ISSN (Electronic)2375-026X

Conference

Conference21st International Conference on Data Engineering, ICDE 2005
Country/TerritoryJapan
CityTokyo
Period5/04/058/04/05
Internet address

Scopus Subject Areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'On discovery of extremely low-dimensional clusters using semi-supervised projected clustering'. Together they form a unique fingerprint.

Cite this