Improving Deep Embedded Clustering via Learning Cluster-level Representations

Qing Yin, Zhihua Wang, Yunya Song, Yida Xu, Shuai Niu, Liang Bai, Yike Guo, Xian Yang*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Driven by recent advances in neural networks, various Deep Embedding Clustering (DEC) based short text clustering models are being developed. In these works, latent representation learning and text clustering are performed simultaneously. Although these methods are becoming increasingly popular, they use pure cluster-oriented objectives, which can produce meaningless representations. To alleviate this problem, several improvements have been developed to introduce additional learning objectives in the clustering process, such as models based on contrastive learning. However, existing efforts rely heavily on learning meaningful representations at the instance level. They have limited focus on learning global representations, which are necessary to capture the overall data structure at the cluster level. In this paper, we propose a novel DEC model, which we named the deep embedded clustering model with cluster-level representation learning (DECCRL) to jointly learn cluster and instance level representations. Here, we extend the embedded topic modelling approach to introduce reconstruction constraints to help learn cluster-level representations. Experimental results on real-world short text datasets demonstrate that our model produces meaningful clusters.

Original languageEnglish
Title of host publicationProceedings of the 29th International Conference on Computational Linguistics
PublisherInternational Committee on Computational Linguistics
Pages2226-2236
Number of pages11
Publication statusPublished - 17 Oct 2022
EventThe 29th International Conference on Computational Linguistics, COLING 2022 - Gyeongju, Korea, Republic of
Duration: 12 Oct 202217 Oct 2022
https://coling2022.org/
https://aclanthology.org/volumes/2022.coling-1/

Publication series

NameProceedings of International Conference on Computational Linguistics
Number1
Volume29
ISSN (Print)2951-2093

Conference

ConferenceThe 29th International Conference on Computational Linguistics, COLING 2022
Country/TerritoryKorea, Republic of
CityGyeongju
Period12/10/2217/10/22
Internet address

Scopus Subject Areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Improving Deep Embedded Clustering via Learning Cluster-level Representations'. Together they form a unique fingerprint.

Cite this