Soft Subspace Clustering for High-Dimensional Data

Liping Jing, Michael K. Ng, Joshua Zhexue Huang

Research output: Chapter in book/report/conference proceedingEntry for encyclopedia/dictionarypeer-review

Abstract

High dimensional data is a phenomenon in real-world data mining applications. Text data is a typical example. In text mining, a text document is viewed as a vector of terms whose dimension is equal to the total number of unique terms in a data set, which is usually in thousands. High dimensional data occurs in business as well. In retails, for example, to effectively manage supplier relationship, suppliers are often categorized according to their business behaviors (Zhang, Huang, Qian, Xu, & Jing, 2006). The supplier’s behavior data is high dimensional, which contains thousands of attributes to describe the supplier’s behaviors, including product items, ordered amounts, order frequencies, product quality and so forth. One more example is DNA microarray data. Clustering high-dimensional data requires special treatment (Swanson, 1990; Jain, Murty, & Flynn, 1999; Cai, He, & Han, 2005; Kontaki, Papadopoulos & Manolopoulos., 2007), although various methods for clustering are available (Jain & Dubes, 1988). One type of clustering methods for high dimensional data is referred to as subspace clustering, aiming at finding clusters from subspaces instead of the entire data space. In a subspace clustering, each cluster is a set of objects identified by a subset of dimensions and different clusters are represented in different subsets of dimensions. Soft subspace clustering considers that different dimensions make different contributions to the identification of objects in a cluster. It represents the importance of a dimension as a weight that can be treated as the degree of the dimension in contribution to the cluster. Soft subspace clustering can find the cluster memberships of objects and identify the subspace of each cluster in the same clustering process.
Original languageEnglish
Title of host publicationEncyclopedia of Data Warehousing and Mining
EditorsJohn Wang
PublisherIGI Global
Pages1810-1814
Number of pages5
Edition2nd
ISBN (Electronic)9781605660110
ISBN (Print)9781605660103, 1605660108
DOIs
Publication statusPublished - 31 Aug 2008

Fingerprint

Dive into the research topics of 'Soft Subspace Clustering for High-Dimensional Data'. Together they form a unique fingerprint.

Cite this