Simple yet Effective Graph Distillation via Clustering

Yurui Lai*, Taiyan Zhang, Renchi Yang

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Despite plentiful successes achieved by graph representation learning in various domains, the training of graph neural networks (GNNs) still remains tenaciously challenging due to the tremendous computational overhead needed for sizable graphs in practice. Recently, graph data distillation (GDD), which seeks to distill large graphs into compact and informative ones, has emerged as a promising technique to enable efficient GNN training. However, most existing GDD works rely on heuristics that align model gradients or representation distributions on condensed and original graphs, leading to compromised result quality, expensive training for distilling large graphs, or both. Motivated by this, this paper presents an efficient and effective GDD approach, ClustGDD. Under the hood, ClustGDD resorts to synthesizing the condensed graph and node attributes through fast and theoretically-grounded clustering that minimizes the within-cluster sum of squares and maximizes the homophily on the original graph. The fundamental idea is inspired by our empirical and theoretical findings unveiling the connection between clustering and empirical condensation quality using Fréchet Inception Distance, a well-known quality metric for synthetic images. Furthermore, to mitigate the adverse effects caused by the homophily-based clustering, ClustGDD refines the nodal attributes of the condensed graph with a small augmentation learned via class-aware graph sampling and consistency loss. Our extensive experiments exhibit that GNNs trained over condensed graphs output by ClustGDD consistently achieve superior or comparable performance to state-of-the-art GDD methods in terms of node classification on five benchmark datasets, while being orders of magnitude faster.
Original languageEnglish
Title of host publicationKDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Place of PublicationNew York, NY, USA
PublisherAssociation for Computing Machinery (ACM)
Pages1229–1240
Number of pages12
Volume2
ISBN (Electronic)9798400714542
DOIs
Publication statusPublished - 3 Aug 2025
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto Convention Centre, Toronto, Canada
Duration: 3 Aug 20257 Aug 2025
https://dl.acm.org/doi/proceedings/10.1145/3690624 (Conference proceeding)
https://kdd2025.kdd.org/ (Conference website)
https://kdd2025.kdd.org/schedule-at-a-glance/ (Conference schedule)

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ISSN (Print)2154-817X

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Abbreviated titleKDD 2025
Country/TerritoryCanada
CityToronto
Period3/08/257/08/25
Internet address

User-Defined Keywords

  • clustering
  • graph data distillation
  • graph neural networks

Fingerprint

Dive into the research topics of 'Simple yet Effective Graph Distillation via Clustering'. Together they form a unique fingerprint.

Cite this