Abstract
Communities, formed by a subset of vertices that are densely connected to each other and loosely connected to outside community members, widely exist to represent functional modules in real-world complex systems. Most existing community detection and search methods aim at finding communities at one single level, neglecting the natural properties of overlapping and hierarchy in communities. Therefore, the discovery of hierarchical graph clustering (HGC) to find communities at different levels, which is particularly useful in many applications. However, existing HGC studies suffer from two significant limitations: 1) inefficiency over large-scale networks, and 2) generating too many levels of community hierarchy without distinguishing the hierarchy differences. To address the above limitations, we revisit the problem of hierarchical graph clustering and formulate the problem based on our proposed three important properties. To tackle it, we propose theoretical-guaranteed fast solutions, in terms of algorithm complexity and hierarchy levels. We first formulate our HGC problem to admit three key properties of hierarchical communities. Based on the natural hierarchical structure of k-core, we develop a simple and importantly useful technique of core propagation. The key idea of core propagation is to take each k-core as one seed of hierarchical communities and find disjoint communities within k-core using a linear-time algorithm of label propagation. We propose two core propagation approaches of top-down and bottom-up algorithms, in terms of different search directions of k-cores by increment and decrement on k, respectively. The top-down method can find a given level of hierarchical communities in O(tm) time, where t is an input of hierarchy levels and m is the graph size. To dismiss the hardness of users' input hierarchy parameter t, the bottom-up algorithm is equipped with a well-designed strategy of auto-adjusting hierarchical levels based on the graph structure itself. We also develop the coreness weight-based label propagation to ensure the accurate label voting of compressed communities at low levels. The bottom-up method runs fast in O(m log δ(G)), where log δ(G) is a small value of the maximum coreness in graph G. Extensive experiments conducted on real-world graphs with ground-truth HGCs validate the effectiveness and efficiency of our proposed core propagation methods against state-of-the-art methods. Two case studies on the world-wide flight network and the Hong Kong road network demonstrate the particular usage of our HGC methods.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE 41st International Conference on Data Engineering, ICDE 2025 |
| Publisher | IEEE |
| Pages | 3904-3916 |
| Number of pages | 13 |
| ISBN (Electronic) | 9798331536039 |
| ISBN (Print) | 9798331536046 |
| DOIs | |
| Publication status | Published - 20 May 2025 |
| Event | 41st IEEE International Conference on Data Engineering, ICDE 2025 - The Hong Kong Polytechnic University, Hong Kong, China Duration: 19 May 2025 → 23 May 2025 https://ieee-icde.org/2025/ https://ieee-icde.org/2025/research-papers/ https://www.computer.org/csdl/proceedings/icde/2025/26FZy3xczFS |
Publication series
| Name | Proceedings - International Conference on Data Engineering |
|---|---|
| ISSN (Print) | 1063-6382 |
| ISSN (Electronic) | 2375-026X |
Conference
| Conference | 41st IEEE International Conference on Data Engineering, ICDE 2025 |
|---|---|
| Abbreviated title | ICDE 2025 |
| Country/Territory | China |
| City | Hong Kong |
| Period | 19/05/25 → 23/05/25 |
| Internet address |
User-Defined Keywords
- core propagation
- hierarchical graph clustering