Abstract
Subspace clustering seeks to identify subspaces that segment a set of n data points into k (k« n) groups, which has emerged as a powerful tool for analyzing data from various domains, especially images and videos. Recently, several studies have demonstrated the great potential of subspace clustering models for partitioning vertices in attributed graphs, referred to as SCAG. However, these works either demand significant computational overhead for constructing the nxn self-expressive matrix, or fail to incorporate graph topology and attribute data into the subspace clustering framework effectively, and thus, compromise result quality.
Motivated by this, this paper presents two effective and efficient algorithms, S2CAG M-S2CAG for SCAG computation. Particularly, S2CAG obtains superb performance through three major contributions. First, we formulate a new objective function for SCAG with a refined representation model for vertices and two non-trivial constraints. On top of that, an efficient linear-time optimization solver is developed based on our theoretically grounded problem transformation and well-thought-out adaptive strategy. We then conduct an in-depth analysis to disclose the theoretical connection of S2CAG to conductance minimization, which further inspires the design of M-S2CAG that maximizes the modularity. Our extensive experiments, comparing S2CAG and M-S2CAG against 17 competitors over 8 benchmark datasets, exhibit that our solutions outperform all baselines in terms of clustering quality measured against the ground truth while delivering high efficiency.
Motivated by this, this paper presents two effective and efficient algorithms, S2CAG M-S2CAG for SCAG computation. Particularly, S2CAG obtains superb performance through three major contributions. First, we formulate a new objective function for SCAG with a refined representation model for vertices and two non-trivial constraints. On top of that, an efficient linear-time optimization solver is developed based on our theoretically grounded problem transformation and well-thought-out adaptive strategy. We then conduct an in-depth analysis to disclose the theoretical connection of S2CAG to conductance minimization, which further inspires the design of M-S2CAG that maximizes the modularity. Our extensive experiments, comparing S2CAG and M-S2CAG against 17 competitors over 8 benchmark datasets, exhibit that our solutions outperform all baselines in terms of clustering quality measured against the ground truth while delivering high efficiency.
Original language | English |
---|---|
Title of host publication | KDD '25 |
Subtitle of host publication | Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 |
Place of Publication | New York |
Publisher | Association for Computing Machinery (ACM) |
Pages | 789–799 |
Number of pages | 11 |
ISBN (Print) | 9798400712456 |
DOIs | |
Publication status | Published - 20 Jul 2025 |
Event | 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining - Toronto, Canada Duration: 3 Aug 2025 → 7 Aug 2025 https://dl.acm.org/doi/proceedings/10.1145/3690624 (Conference Proceedings) https://kdd2025.kdd.org/ (Conference website) |
Publication series
Name | KDD: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining |
---|---|
Publisher | Association for Computing Machinery |
Conference
Conference | 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining |
---|---|
Abbreviated title | KDD 2025 |
Country/Territory | Canada |
City | Toronto |
Period | 3/08/25 → 7/08/25 |
Internet address |
|
User-Defined Keywords
- attributed graph
- conductance
- modularity
- subspace clustering