TY - JOUR
T1 - Efficient and Effective Attributed Hypergraph Clustering via K-Nearest Neighbor Augmentation
AU - Li, Yiran
AU - Yang, Renchi
AU - Shi, Jieming
N1 - Funding information:
This work is supported by Hong Kong RGC ECS No. 25201221, National Natural Science Foundation of China No. 62202404, and project P0036831. Renchi Yang is supported by A*STAR, Singapore under Grant A19E3b0099 and the Departmental Start-up Fund from the Department of Computer Science, HKBU. This work is also supported by a research grant from Tencent Technology (Shenzhen) Co., Ltd (P0039546).
Publisher copyright:
© 2023 ACM
PY - 2023/6
Y1 - 2023/6
N2 - Hypergraphs are an omnipresent data structure used to represent high-order interactions among entities. Given a hypergraph H wherein nodes are associated with attributes, attributed hypergraph clustering (AHC) aims to partition the nodes in H into k disjoint clusters, such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes on large attributed hypergraphs for accurate clustering. Existing AHC solutions suffer from issues of prohibitive computational costs, sub-par clustering quality, or both. In this paper, we present AHCKA, an efficient approach to AHC, which achieves state-of-the-art result quality via several algorithmic designs. Under the hood, AHCKA includes three key components: (i) a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, (ii) a joint hypergraph random walk model to devise an effective optimization objective towards AHC, and (iii) a highly efficient solver with speedup techniques for the problem optimization. Extensive experiments, comparing AHCKA against 15 baselines over 8 real attributed hypergraphs, reveal that AHCKA is superior to existing competitors in terms of clustering quality, while often being up to orders of magnitude faster.
AB - Hypergraphs are an omnipresent data structure used to represent high-order interactions among entities. Given a hypergraph H wherein nodes are associated with attributes, attributed hypergraph clustering (AHC) aims to partition the nodes in H into k disjoint clusters, such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes on large attributed hypergraphs for accurate clustering. Existing AHC solutions suffer from issues of prohibitive computational costs, sub-par clustering quality, or both. In this paper, we present AHCKA, an efficient approach to AHC, which achieves state-of-the-art result quality via several algorithmic designs. Under the hood, AHCKA includes three key components: (i) a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, (ii) a joint hypergraph random walk model to devise an effective optimization objective towards AHC, and (iii) a highly efficient solver with speedup techniques for the problem optimization. Extensive experiments, comparing AHCKA against 15 baselines over 8 real attributed hypergraphs, reveal that AHCKA is superior to existing competitors in terms of clustering quality, while often being up to orders of magnitude faster.
KW - hypergraph
KW - random walk
KW - clustering
KW - eigenvector
UR - https://dl.acm.org/toc/pacmmod/2023/1/2
U2 - 10.1145/3589261
DO - 10.1145/3589261
M3 - Journal article
SN - 2836-6573
VL - 1
JO - Proceedings of the ACM on Management of Data
JF - Proceedings of the ACM on Management of Data
IS - 2
M1 - 116
ER -