TY - GEN
T1 - When Structure Meets Keywords
T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020
AU - Zhu, Yuanyuan
AU - He, Jian
AU - Ye, Junhao
AU - Qin, Lu
AU - Huang, Xin
AU - Yu, Jeffrey Xu
N1 - Funding Information:
Acknowledgments. The work was supported in part by grants of Natural Science Foundation 61972291 and 61702435, Natural Science Foundation of Hubei Province 2018CFB519, Fundamental Research Funds for the Central Universities 2042019kf0224, ARC FT200100787, the Research Grant Council of the Hong Kong, China No. 14203618 and No. 1420291. Yuanyuan Zhu is the corresponding author.
PY - 2020/10/19
Y1 - 2020/10/19
N2 - As an online, query-dependent variant of the well-known community detection problem, community search has been studied for years to find communities containing the query vertices. Along with the generation of graphs with rich attribute information, attributed community search has attracted increasing interest recently, aiming to select communities where vertices are cohesively connected and share homogeneous attributes. However, existing community models may include cut-edges/vertices and thus cannot well guarantee the strong connectivity required by a cohesive community. In this paper, we propose a new cohesive attributed community (CAC) model that can ensure both structure cohesiveness and attribute cohesiveness of communities. Specifically, for a query with vertex vq and keyword set S, we aim to find the cohesively connected communities containing vq with the most shared keywords in S. It is nontrivial as we need to explore all possible subsets of S to verify the existence of structure cohesive communities until we find the communities with the most common keywords. To tackle this problem, we make efforts in two aspects. The first is to reduce the candidate keyword subsets. We achieve this by exploring the anti-monotonicity and neighborhood-constraint properties of our CAC model so that we can filter out the unpromising keyword subsets. The second is to speed up the verification process for each candidate keyword subset. We propose two indexes TIndex and MTIndex to reduce the size of the candidate subgraph before the verification. Moreover, we derive two new properties based on these indexes to reduce the candidate keyword subsets further. We conducted extensive experimental studies on four real-world graphs and validated the effectiveness and efficiency of our approaches.
AB - As an online, query-dependent variant of the well-known community detection problem, community search has been studied for years to find communities containing the query vertices. Along with the generation of graphs with rich attribute information, attributed community search has attracted increasing interest recently, aiming to select communities where vertices are cohesively connected and share homogeneous attributes. However, existing community models may include cut-edges/vertices and thus cannot well guarantee the strong connectivity required by a cohesive community. In this paper, we propose a new cohesive attributed community (CAC) model that can ensure both structure cohesiveness and attribute cohesiveness of communities. Specifically, for a query with vertex vq and keyword set S, we aim to find the cohesively connected communities containing vq with the most shared keywords in S. It is nontrivial as we need to explore all possible subsets of S to verify the existence of structure cohesive communities until we find the communities with the most common keywords. To tackle this problem, we make efforts in two aspects. The first is to reduce the candidate keyword subsets. We achieve this by exploring the anti-monotonicity and neighborhood-constraint properties of our CAC model so that we can filter out the unpromising keyword subsets. The second is to speed up the verification process for each candidate keyword subset. We propose two indexes TIndex and MTIndex to reduce the size of the candidate subgraph before the verification. Moreover, we derive two new properties based on these indexes to reduce the candidate keyword subsets further. We conducted extensive experimental studies on four real-world graphs and validated the effectiveness and efficiency of our approaches.
KW - attributed graphs
KW - community search
KW - truss model
UR - http://www.scopus.com/inward/record.url?scp=85095865268&partnerID=8YFLogxK
U2 - 10.1145/3340531.3412006
DO - 10.1145/3340531.3412006
M3 - Conference proceeding
AN - SCOPUS:85095865268
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1913
EP - 1922
BT - CIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery (ACM)
Y2 - 19 October 2020 through 23 October 2020
ER -