On the performance of feature weighting K-means for text subspace clustering

Liping Jing*, Michael K. Ng, Jun Xu, Joshua Zhexue Huang

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

8 Citations (Scopus)

Abstract

Text clustering is an effective way of not only organizing textual information, but discovering interesting patterns. Most existing methods, however, suffer from two main drawbacks; they cannot provide an understandable representation for text clusters, and cannot scale to very large text collections. Highly scalable text clustering algorithms are becoming increasingly relevant. In this paper, we present a performance study of a new subspace clustering algorithm for large sparse text data. This algorithm automatically calculates the feature weights in the k-means clustering process. The feature weights are used to discover clusters from subspaces of the text vector space and identify terms that represent the semantics of the clusters. A series of experiments have been conducted to test the performance of the algorithm, including resource consumption and clustering quality. The experimental results on real-world text data have shown that our algorithm quickly converges to a local optimal solution and is scalable to the number of documents, terms and the number of clusters.

Original languageEnglish
Title of host publicationAdvances in Web-Age Information Management - 6th International Conference, WAIM 2005, Proceedings
Pages502-512
Number of pages11
DOIs
Publication statusPublished - 2005
Event6th International Conference on Advances in Web-Age Information Management, WAIM 2005 - Hangzhou, China
Duration: 11 Oct 200513 Oct 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3739 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th International Conference on Advances in Web-Age Information Management, WAIM 2005
Country/TerritoryChina
CityHangzhou
Period11/10/0513/10/05

Scopus Subject Areas

  • Theoretical Computer Science
  • Computer Science(all)

User-Defined Keywords

  • Convergency
  • Feature Weighting
  • Scalability
  • Subspace Clustering
  • Text Clustering

Fingerprint

Dive into the research topics of 'On the performance of feature weighting K-means for text subspace clustering'. Together they form a unique fingerprint.

Cite this