NILC: Discovering New Intents with LLM-assisted Clustering

  • Hongtao Wang
  • , Renchi Yang*
  • , Wenqing Lin
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

New intent discovery (NID) seeks to recognize both new and known intents from unlabeled user utterances, which finds prevalent use in practical dialogue systems. Existing works towards NID mainly adopt a cascaded architecture, wherein the first stage focuses on encoding the utterances into informative text embeddings beforehand, while the latter is to group similar embeddings into clusters (i.e., intents), typically by K-Means. However, such a cascaded pipeline fails to leverage the feedback from both steps for mutual refinement, and, meanwhile, the embedding-only clustering overlooks nuanced textual semantics, leading to suboptimal performance.

To bridge this gap, this paper proposes NILC, a novel clustering framework specially catered for effective NID. Particularly, NILC follows an iterative workflow, in which clustering assignments are judiciously updated by carefully refining cluster centroids and text embeddings of uncertain utterances with the aid of large language models (LLMs). Specifically, NILC first taps into LLMs to create additional semantic centroids for clusters, thereby enriching the contextual semantics of the Euclidean centroids of embeddings. Moreover, LLMs are then harnessed to augment hard samples (ambiguous or terse utterances) identified from clusters via rewriting for subsequent cluster correction. Further, we inject supervision signals through non-trivial techniques seeding and soft must links for more accurate NID in the semi-supervised setting. Extensive experiments comparing NILC against multiple recent baselines under both unsupervised and semi-supervised settings showcase that NILC can achieve significant performance improvements over six benchmark datasets of diverse domains consistently.
Original languageEnglish
Title of host publicationProceedings of the Nineteenth ACM International Conference on Web Search and Data Mining
EditorsGiuseppe Manco, Barbara Poblete
Place of PublicationNew York, NY, USA
PublisherAssociation for Computing Machinery (ACM)
Pages671–680
Number of pages10
ISBN (Print)9798400722929
DOIs
Publication statusPublished - 21 Feb 2026

Publication series

NameWSDM: Web Search and Data Mining Conference
PublisherAssociation for Computing Machinery

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • intent discovery
  • clustering
  • large language models

Fingerprint

Dive into the research topics of 'NILC: Discovering New Intents with LLM-assisted Clustering'. Together they form a unique fingerprint.

Cite this