TY - JOUR
T1 - Federated hierarchical clustering with automatic selection of optimal cluster numbers
AU - Zhang, Yue
AU - Qiu, Chuanlong
AU - Liao, Xinfa
AU - Zhang, Yiqun
N1 - Funding information:
This work was supported in part by the National Natural Science Foundation of China under grants 62172112 and 62476063, the National Key Research and Development Program of China under grant 2022YFE0112200, the Guangdong Provincial Key Laboratory of Intellectual Property and Big Data under grant 2018B030322016, and the Natural Science Foundation of Guangdong Province under grant 2025A1515011293.
Publisher Copyright:
© 2025 Elsevier Inc.
PY - 2025/12/4
Y1 - 2025/12/4
N2 - Federated Clustering (FC) is an emerging and promising solution for exploring data distribution patterns from distributed and privacy-protected data in an unsupervised manner. Existing FC methods implicitly rely on the assumption that clients have a known number of uniformly sized clusters. However, the true number of clusters is typically unknown, and cluster sizes are naturally imbalanced in real scenarios. Furthermore, the privacy-preserving transmission constraints in federated learning inevitably reduce usable information, making the development of robust and accurate FC extremely challenging. Accordingly, we propose a novel FC framework named Fed-k∗-HC, which can automatically determine an optimal number of clusters k∗ based on the data distribution explored through hierarchical clustering. To obtain the global data distribution for k∗ determination, we let each client generate micro-subclusters. Their prototypes are then uploaded to the server for hierarchical merging. The density-based merging design allows for exploring clusters of varying sizes and shapes, and the progressive merging process can self-terminate according to the neighboring relationships among the prototypes to determine k∗. Extensive experiments on diverse datasets demonstrate the FC capability of the proposed Fed-k∗-HC in accurately exploring a proper number of clusters.
AB - Federated Clustering (FC) is an emerging and promising solution for exploring data distribution patterns from distributed and privacy-protected data in an unsupervised manner. Existing FC methods implicitly rely on the assumption that clients have a known number of uniformly sized clusters. However, the true number of clusters is typically unknown, and cluster sizes are naturally imbalanced in real scenarios. Furthermore, the privacy-preserving transmission constraints in federated learning inevitably reduce usable information, making the development of robust and accurate FC extremely challenging. Accordingly, we propose a novel FC framework named Fed-k∗-HC, which can automatically determine an optimal number of clusters k∗ based on the data distribution explored through hierarchical clustering. To obtain the global data distribution for k∗ determination, we let each client generate micro-subclusters. Their prototypes are then uploaded to the server for hierarchical merging. The density-based merging design allows for exploring clusters of varying sizes and shapes, and the progressive merging process can self-terminate according to the neighboring relationships among the prototypes to determine k∗. Extensive experiments on diverse datasets demonstrate the FC capability of the proposed Fed-k∗-HC in accurately exploring a proper number of clusters.
KW - Automatic cluster number estimation
KW - Federated clustering
KW - Hierarchical clustering
KW - Imbalanced data
UR - https://www.scopus.com/pages/publications/105024305997
U2 - 10.1016/j.ins.2025.122957
DO - 10.1016/j.ins.2025.122957
M3 - Journal article
AN - SCOPUS:105024305997
SN - 0020-0255
VL - 733
JO - Information Sciences
JF - Information Sciences
M1 - 122957
ER -