Federated hierarchical clustering with automatic selection of optimal cluster numbers

  • Yue Zhang
  • , Chuanlong Qiu
  • , Xinfa Liao
  • , Yiqun Zhang*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

1 Citation (Scopus)

Abstract

Federated Clustering (FC) is an emerging and promising solution for exploring data distribution patterns from distributed and privacy-protected data in an unsupervised manner. Existing FC methods implicitly rely on the assumption that clients have a known number of uniformly sized clusters. However, the true number of clusters is typically unknown, and cluster sizes are naturally imbalanced in real scenarios. Furthermore, the privacy-preserving transmission constraints in federated learning inevitably reduce usable information, making the development of robust and accurate FC extremely challenging. Accordingly, we propose a novel FC framework named Fed-k-HC, which can automatically determine an optimal number of clusters k based on the data distribution explored through hierarchical clustering. To obtain the global data distribution for k determination, we let each client generate micro-subclusters. Their prototypes are then uploaded to the server for hierarchical merging. The density-based merging design allows for exploring clusters of varying sizes and shapes, and the progressive merging process can self-terminate according to the neighboring relationships among the prototypes to determine k. Extensive experiments on diverse datasets demonstrate the FC capability of the proposed Fed-k-HC in accurately exploring a proper number of clusters.

Original languageEnglish
Article number122957
Number of pages21
JournalInformation Sciences
Volume733
Early online date4 Dec 2025
DOIs
Publication statusE-pub ahead of print - 4 Dec 2025

User-Defined Keywords

  • Automatic cluster number estimation
  • Federated clustering
  • Hierarchical clustering
  • Imbalanced data

Fingerprint

Dive into the research topics of 'Federated hierarchical clustering with automatic selection of optimal cluster numbers'. Together they form a unique fingerprint.

Cite this