TY - JOUR
T1 - Sentiment Lexicon Construction with Hierarchical Supervision Topic Model
AU - Deng, Dong
AU - Jing, Liping
AU - Yu, Jian
AU - Sun, Shaolong
AU - Ng, Michael K.
N1 - Funding Information:
Manuscript received April 27, 2018; revised September 22, 2018 and December 15, 2018; accepted December 24, 2018. Date of publication January 10, 2019; date of current version February 15, 2019. This work was supported in part by the National Natural Science Foundation of China under Grants 61822601, 61773050, and 61632004, in part by the Beijing Natural Science Foundation under Grant Z180006, and in part by the Beijing Municipal Science & Technology Commission under Grant Z181100008918012. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Carlos Busso. (Corresponding author: Liping Jing.) D. Deng, L. Jing, and J. Yu are with the School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China (e-mail:, [email protected]; [email protected]; [email protected]).
PY - 2019/4/1
Y1 - 2019/4/1
N2 - In this paper, we propose a novel hierarchical supervision topic model to construct a topic-adaptive sentiment lexicon (TaSL) for higher-level classification tasks. It is widely recognized that sentiment lexicon as a useful prior knowledge is crucial in sentiment analysis or opinion mining. However, many existing sentiment lexicons are constructed ignoring the variability of the sentiment polarities of words in different topics or domains. For example, the word 'amazing' can refer to causing great surprise or wonder but can also refer to very impressive and excellent. In TaSL, we solve this issue by jointly considering the topics and sentiments of words. Documents are represented by multiple pairs of topics and sentiments, where each pair is characterized by a multinomial distribution over words. Meanwhile, this generating process is supervised under hierarchical supervision information of documents and words. The main advantage of TaSL is that the sentiment polarity of each word in different topics can be sufficiently captured. This model is beneficial to construct a domain-specific sentiment lexicon and then effectively improve the performance of sentiment classification. Extensive experimental results on four publicly available datasets, MR, OMD, semEval13A, and semEval16B were presented to demonstrate the usefulness of the proposed approach. The results have shown that TaSL performs better than the existing manual sentiment lexicon (MPQA), the topic model based domain-specific lexicon (ssLDA), the expanded lexicons(Weka-ED, Weka-STS, NRC, Liu's), and deep neural network based lexicons (nnLexicon, HIT, HSSWE).
AB - In this paper, we propose a novel hierarchical supervision topic model to construct a topic-adaptive sentiment lexicon (TaSL) for higher-level classification tasks. It is widely recognized that sentiment lexicon as a useful prior knowledge is crucial in sentiment analysis or opinion mining. However, many existing sentiment lexicons are constructed ignoring the variability of the sentiment polarities of words in different topics or domains. For example, the word 'amazing' can refer to causing great surprise or wonder but can also refer to very impressive and excellent. In TaSL, we solve this issue by jointly considering the topics and sentiments of words. Documents are represented by multiple pairs of topics and sentiments, where each pair is characterized by a multinomial distribution over words. Meanwhile, this generating process is supervised under hierarchical supervision information of documents and words. The main advantage of TaSL is that the sentiment polarity of each word in different topics can be sufficiently captured. This model is beneficial to construct a domain-specific sentiment lexicon and then effectively improve the performance of sentiment classification. Extensive experimental results on four publicly available datasets, MR, OMD, semEval13A, and semEval16B were presented to demonstrate the usefulness of the proposed approach. The results have shown that TaSL performs better than the existing manual sentiment lexicon (MPQA), the topic model based domain-specific lexicon (ssLDA), the expanded lexicons(Weka-ED, Weka-STS, NRC, Liu's), and deep neural network based lexicons (nnLexicon, HIT, HSSWE).
KW - opinion mining
KW - Sentiment analysis
KW - sentiment lexicon construction
KW - text mining
KW - topic model
UR - http://www.scopus.com/inward/record.url?scp=85062215865&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2019.2892232
DO - 10.1109/TASLP.2019.2892232
M3 - Journal article
AN - SCOPUS:85062215865
SN - 2329-9290
VL - 27
SP - 704
EP - 718
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
IS - 4
M1 - 8607058
ER -