Subspace clustering of text documents with feature weighting k-means algorithm

Liping Jing*, Michael K. Ng, Jun Xu, Joshua Zhexue Huang

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

87 Citations (Scopus)

Abstract

This paper presents a new method to solve the problem of clustering large and complex text data. The method is based on a new subspace clustering algorithm that automatically calculates the feature weights in the k-means clustering process. In clustering sparse text data the feature weights are used to discover clusters from subspaces of the document vector space and identify key words that represent the semantics of the clusters. We present a modification of the published algorithm to solve the sparsity problem that occurs in text clustering. Experimental results on real-world text data have shown that the new method outper-formed the Standard K Means and Bisection-KMeans algorithms, while still maintaining efficiency of the k-means clustering process.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining
Subtitle of host publication9th Pacific-Asia Conference, PAKDD 2005, Hanoi, Vietnam, May 18-20, 2005, Proceedings
EditorsTu Bao Ho, David Cheung, Huan Liu
PublisherSpringer Berlin Heidelberg
Pages802-812
Number of pages11
Edition1st
ISBN (Electronic)9783540319351
ISBN (Print)3540260765, 9783540260769
DOIs
Publication statusPublished - 10 May 2005
Event9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005 - Melia Hanoi Hotel, Hanoi, Viet Nam
Duration: 18 May 200520 May 2005
https://www.jaist.ac.jp/PAKDD-05/ (conference website)
https://www.jaist.ac.jp/PAKDD-05/ (conference program)

Publication series

NameLecture Notes in Computer Science
Volume3518
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameLecture Notes in Artificial Intelligence
PublisherSpringer
ISSN (Print)2945-9133
ISSN (Electronic)2945-9141
NamePAKDD: Pacific-Asia Conference on Knowledge Discovery and Data Mining

Conference

Conference9th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2005
Country/TerritoryViet Nam
CityHanoi
Period18/05/0520/05/05
Internet address

Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

User-Defined Keywords

  • Cluster Interpretation
  • Feature Weighting
  • High Dimensional Data
  • Subspace Clustering
  • Text Mining

Cite this