Exploiting word cluster information for unsupervised feature selection

Qingyao Wu*, Yunming Ye, Kwok Po NG, Hanjing Su, Joshua Huang

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

This paper presents an approach to integrate word clustering information into the process of unsupervised feature selection. In our scheme, the words in the whole feature space are clustered into groups based on the co-occurrence statistics of words. The resulted word clustering information and the bag-of-word information are combined together to measure the goodness of each word, which is our basic metric for selecting discriminative features. By exploiting word cluster information, we extend three well-known unsupervised feature selection methods and propose three new methods. A series of experiments are performed on three benchmark text data sets (the 20 Newsgroups, Reuters-21578 and CLASSIC3). The experimental results have shown that the new unsupervised feature selection methods can select more discriminative features, and in turn improve the clustering performance.

Original languageEnglish
Title of host publicationPRICAI 2010
Subtitle of host publicationTrends in Artificial Intelligence - 11th Pacific Rim International Conference on Artificial Intelligence, Proceedings
Pages292-303
Number of pages12
DOIs
Publication statusPublished - 2010
Event11th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2010 - Daegu, Korea, Republic of
Duration: 30 Aug 20102 Sep 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6230 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2010
Country/TerritoryKorea, Republic of
CityDaegu
Period30/08/102/09/10

Scopus Subject Areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Exploiting word cluster information for unsupervised feature selection'. Together they form a unique fingerprint.

Cite this