Effective Clustering on Large Attributed Bipartite Graphs

Renchi Yang, Yidu Wu, Xiaoyang Lin, Qichen Wang, Tsz Nam Chan, Jieming Shi

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes that are associated with rich attributes, such as customer-product purchase networks and author-paper authorship graphs. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains, including social network analysis, recommendation systems, information retrieval, and bioinformatics. However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately, engendering severely compromised result quality. The severity of these issues is accentuated in real ABGs, which often encompass millions of nodes and a sheer volume of attribute data, rendering effective k-ABGC over such graphs highly challenging.

In this paper, we propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets. TPO obtains high clustering quality through two major contributions: (i) a novel formulation and transformation of the k-ABGC problem based on multi-scale attribute affinity specialized for capturing attribute affinities between nodes with the consideration of their multi-hop connections in ABGs, and (ii) a highly efficient solver that includes a suite of carefully-crafted optimizations for sidestepping explicit affinity matrix construction and facilitating faster convergence. Extensive experiments, comparing TPO against 19 baselines over 5 real ABGs, showcase the superior clustering quality of TPO measured against ground-truth labels. Moreover, compared to the state of the arts, TPO is often more than 40x faster over both small and large ABGs.
Original languageEnglish
Title of host publicationKDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
EditorsRicardo Baeza-Yates, Francesco Bonchi
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages3782-3793
Number of pages12
Edition1st
ISBN (Electronic)9798400704901
DOIs
Publication statusPublished - 24 Aug 2024
Event30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024 - Barcelona, Spain
Duration: 25 Aug 202429 Aug 2024
https://kdd2024.kdd.org/

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ISSN (Print)2154-817X

Conference

Conference30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2024
Country/TerritorySpain
CityBarcelona
Period25/08/2429/08/24
Internet address

Scopus Subject Areas

  • Software
  • Information Systems

User-Defined Keywords

  • attributes
  • bipartite graphs
  • clustering
  • eigenvector

Fingerprint

Dive into the research topics of 'Effective Clustering on Large Attributed Bipartite Graphs'. Together they form a unique fingerprint.

Cite this