TY - GEN
T1 - Exploiting word cluster information for unsupervised feature selection
AU - Wu, Qingyao
AU - Ye, Yunming
AU - NG, Kwok Po
AU - Su, Hanjing
AU - Huang, Joshua
N1 - Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - This paper presents an approach to integrate word clustering information into the process of unsupervised feature selection. In our scheme, the words in the whole feature space are clustered into groups based on the co-occurrence statistics of words. The resulted word clustering information and the bag-of-word information are combined together to measure the goodness of each word, which is our basic metric for selecting discriminative features. By exploiting word cluster information, we extend three well-known unsupervised feature selection methods and propose three new methods. A series of experiments are performed on three benchmark text data sets (the 20 Newsgroups, Reuters-21578 and CLASSIC3). The experimental results have shown that the new unsupervised feature selection methods can select more discriminative features, and in turn improve the clustering performance.
AB - This paper presents an approach to integrate word clustering information into the process of unsupervised feature selection. In our scheme, the words in the whole feature space are clustered into groups based on the co-occurrence statistics of words. The resulted word clustering information and the bag-of-word information are combined together to measure the goodness of each word, which is our basic metric for selecting discriminative features. By exploiting word cluster information, we extend three well-known unsupervised feature selection methods and propose three new methods. A series of experiments are performed on three benchmark text data sets (the 20 Newsgroups, Reuters-21578 and CLASSIC3). The experimental results have shown that the new unsupervised feature selection methods can select more discriminative features, and in turn improve the clustering performance.
UR - http://www.scopus.com/inward/record.url?scp=78049241090&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15246-7_28
DO - 10.1007/978-3-642-15246-7_28
M3 - Conference proceeding
AN - SCOPUS:78049241090
SN - 3642152457
SN - 9783642152450
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 292
EP - 303
BT - PRICAI 2010
T2 - 11th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2010
Y2 - 30 August 2010 through 2 September 2010
ER -