TY - JOUR
T1 - SDENK: Unbiased Subspace Density-k-Clustering
AU - Zou, Rong
AU - Zhang, Yunfan
AU - Zhao, Mingjie
AU - Tan, Zexi
AU - Zhang, Yiqun
AU - Cheung, Yiu-ming
N1 - This work was supported in part by the National Natural Science Foundation of China (NSFC) under grant 62476063, the NSFC/Research Grants Council (RGC) Joint Research Scheme under the grant N_HKBU214/21, the Natural Science Foundation of Guangdong Province under grant 2025A1515011293, the General Research Fund of RGC under grants: 12202622, and 12201323, the RGC Senior Research Fellow Scheme under grant SRFS2324-2S02. The authors would like to thank Mr. Sen Feng for his implementation of the SOHI comparative method. Professional English language editing support was provided by AsiaEdit (asiaedit.com).
Publisher copyright:
© 2025 The Author(s). Published by Elsevier B.V.
PY - 2025/11/7
Y1 - 2025/11/7
N2 - Clustering is one of the most important data analysis techniques, as it extracts knowledge without requiring data labels, making it crucial in many unsupervised application scenarios. However, conventional -clustering struggles to detect irregularly distributed clusters owing to its inherent preference for convex clusters, while density-based clustering often lacks the ability to customise an appropriate metric space for different clustering tasks owing to the lack of a task-oriented optimisation process. To simultaneously address these biases in cluster shape and metric space, this paper proposes an unbiased hybrid framework to perform density-based clustering in subspaces. These subspaces are constructed through a newly developed attribute-weighted -clustering paradigm. To exploit the irregularly distributed clusters obtained via density-based clustering for subspace learning, a novel strategy is designed to subdivide the clusters into compact sub-clusters, which are more suitable for evaluating attribute importance through -clustering. As a result, the proposed subspace density -clustering algorithm inherits the shape flexibility of density-based clustering and the metric adaptiveness of -clustering. Moreover, the learnable design enables mutual optimisation between density clusters and subspaces, yielding robust and superior clustering performance across various datasets. Comprehensive evaluations, including comparative clustering performance evaluation, ablation studies, significance tests, noise-robustness evaluation, and hyper-parameter sensitivity studies, are conducted. Among 10 compared methods, SDENK achieves an average rank of 1.75 on 12 datasets in terms of the clustering accuracy metric ARI.
AB - Clustering is one of the most important data analysis techniques, as it extracts knowledge without requiring data labels, making it crucial in many unsupervised application scenarios. However, conventional -clustering struggles to detect irregularly distributed clusters owing to its inherent preference for convex clusters, while density-based clustering often lacks the ability to customise an appropriate metric space for different clustering tasks owing to the lack of a task-oriented optimisation process. To simultaneously address these biases in cluster shape and metric space, this paper proposes an unbiased hybrid framework to perform density-based clustering in subspaces. These subspaces are constructed through a newly developed attribute-weighted -clustering paradigm. To exploit the irregularly distributed clusters obtained via density-based clustering for subspace learning, a novel strategy is designed to subdivide the clusters into compact sub-clusters, which are more suitable for evaluating attribute importance through -clustering. As a result, the proposed subspace density -clustering algorithm inherits the shape flexibility of density-based clustering and the metric adaptiveness of -clustering. Moreover, the learnable design enables mutual optimisation between density clusters and subspaces, yielding robust and superior clustering performance across various datasets. Comprehensive evaluations, including comparative clustering performance evaluation, ablation studies, significance tests, noise-robustness evaluation, and hyper-parameter sensitivity studies, are conducted. Among 10 compared methods, SDENK achieves an average rank of 1.75 on 12 datasets in terms of the clustering accuracy metric ARI.
KW - Subspace clustering
KW - Attributes weighting
KW - Density-based clustering
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=105013515833&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2025.131225
DO - 10.1016/j.neucom.2025.131225
M3 - Journal article
SN - 0925-2312
VL - 653
JO - Neurocomputing
JF - Neurocomputing
M1 - 131225
ER -