Abstract
Driven by recent advances in neural networks, various Deep Embedding Clustering (DEC) based short text clustering models are being developed. In these works, latent representation learning and text clustering are performed simultaneously. Although these methods are becoming increasingly popular, they use pure cluster-oriented objectives, which can produce meaningless representations. To alleviate this problem, several improvements have been developed to introduce additional learning objectives in the clustering process, such as models based on contrastive learning. However, existing efforts rely heavily on learning meaningful representations at the instance level. They have limited focus on learning global representations, which are necessary to capture the overall data structure at the cluster level. In this paper, we propose a novel DEC model, which we named the deep embedded clustering model with cluster-level representation learning (DECCRL) to jointly learn cluster and instance level representations. Here, we extend the embedded topic modelling approach to introduce reconstruction constraints to help learn cluster-level representations. Experimental results on real-world short text datasets demonstrate that our model produces meaningful clusters.
Original language | English |
---|---|
Title of host publication | Proceedings of the 29th International Conference on Computational Linguistics |
Publisher | International Committee on Computational Linguistics |
Pages | 2226-2236 |
Number of pages | 11 |
Publication status | Published - 17 Oct 2022 |
Event | The 29th International Conference on Computational Linguistics, COLING 2022 - Gyeongju, Korea, Republic of Duration: 12 Oct 2022 → 17 Oct 2022 https://coling2022.org/ https://aclanthology.org/volumes/2022.coling-1/ |
Publication series
Name | Proceedings of International Conference on Computational Linguistics |
---|---|
Number | 1 |
Volume | 29 |
ISSN (Print) | 2951-2093 |
Conference
Conference | The 29th International Conference on Computational Linguistics, COLING 2022 |
---|---|
Country/Territory | Korea, Republic of |
City | Gyeongju |
Period | 12/10/22 → 17/10/22 |
Internet address |
Scopus Subject Areas
- Computational Theory and Mathematics
- Computer Science Applications
- Theoretical Computer Science