TY - JOUR
T1 - Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information Networks
AU - Zhou, Yingli
AU - Fang, Yixiang
AU - Ma, Chenhao
AU - Hou, Tianci
AU - Huang, Xin
N1 - This work was supported in part by NSFC under Grant 62102341 and 62302421, Guangdong Talent Program under Grant 2021QN02X826, and Shenzhen Science and Technology Program under Grants JCYJ20220530143602006 and ZDSYS 20211021111415025. This work was supported by Guangdong Provincial Key Laboratory of Mathematical Foundations for Artificial Intelligence (2023B1212010001), and Hong Kong RGC grants No. GRF/12201923 and CRF/C2003-23Y. This work was also supported by Basic and Applied Basic Research Fund in Guangdong Province under Grant 2023A1515011280, and the Guangdong Provincial Key Laboratory of Big Data Computing, The Chinese University of Hong Kong, Shenzhen.
Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.
PY - 2024/8/30
Y1 - 2024/8/30
N2 - In the heterogeneous information network (HIN), a motif-clique is a "complete graph" for a given motif (or a small connected graph) that could capture the desired relationship in the motif. The maximal motif-cliques of HINs have found various applications in community discovery, recommendation, and biological network analysis. The state-of-the-art algorithm for enumerating maximal motif-cliques may have to explore all possible subgraphs of a maximal motif-clique and check whether a maximal motif-clique has been enumerated at each recursive step, which is very time-consuming. To improve the efficiency of enumeration, in this paper, we develop efficient algorithms for maximal motif-clique enumeration over large HINs. We first introduce an order-based framework to avoid duplicated enumeration, which results in lower time complexity compared to the existing algorithm. We then propose a pivot-based pruning strategy, which significantly reduces the search space. We further optimize the process of identifying the candidate sets and locating the subgraphs containing the maximal motif-cliques. Extensive experiments on five real-world HINs demonstrate that our proposed algorithm achieves high efficiency and is up to three orders of magnitude faster than the state-of-the-art algorithm.
AB - In the heterogeneous information network (HIN), a motif-clique is a "complete graph" for a given motif (or a small connected graph) that could capture the desired relationship in the motif. The maximal motif-cliques of HINs have found various applications in community discovery, recommendation, and biological network analysis. The state-of-the-art algorithm for enumerating maximal motif-cliques may have to explore all possible subgraphs of a maximal motif-clique and check whether a maximal motif-clique has been enumerated at each recursive step, which is very time-consuming. To improve the efficiency of enumeration, in this paper, we develop efficient algorithms for maximal motif-clique enumeration over large HINs. We first introduce an order-based framework to avoid duplicated enumeration, which results in lower time complexity compared to the existing algorithm. We then propose a pivot-based pruning strategy, which significantly reduces the search space. We further optimize the process of identifying the candidate sets and locating the subgraphs containing the maximal motif-cliques. Extensive experiments on five real-world HINs demonstrate that our proposed algorithm achieves high efficiency and is up to three orders of magnitude faster than the state-of-the-art algorithm.
UR - http://www.scopus.com/inward/record.url?scp=85205369801&partnerID=8YFLogxK
U2 - 10.14778/3681954.3681975
DO - 10.14778/3681954.3681975
M3 - Conference article
AN - SCOPUS:85205369801
SN - 2150-8097
VL - 17
SP - 2946
EP - 2959
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 11
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
Y2 - 26 August 2024 through 30 August 2024
ER -