TY - GEN
T1 - Fast Approximate All Pairwise CoSimRanks via Random Projection
AU - Yang, Renchi
AU - Xiao, Xiaokui
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021/12/2
Y1 - 2021/12/2
N2 - Given a graph G with n nodes, and two nodes u, v∈ G, the CoSimRank value s(u, v) quantifies the similarity between u and v based on graph topology. Compared to SimRank, CoSimRank is shown to be more accurate and effective in many real-world applications including synonym expansion, lexicon extraction, and entity relatedness in knowledge graphs. The computation of all pairwise CoSimRanks in G is highly expensive and challenging. Existing solutions all focus on devising approximate algorithms for the computation of all pairwise CoSimRanks. To attain a desired absolute accuracy guarantee ϵ, the state-of-the-art approximate algorithm for computing all pairwise CoSimRanks requires O(n3log2(ln(1/ϵ))) time, which is prohibitively expensive even ϵ is large. In this paper, we propose RPCS, a fast randomized algorithm for computing all pairwise CoSimRank values. The basic idea of RPCS is to approximate the n× n matrix multiplications in CoSimRank computation via random projection. Theoretically, RPCS runs in O(n2ln(n)/ϵ2ln(1/ϵ)) time and meanwhile ensures an absolute error of at most ϵ in each CoSimRank value in G with a high probability. Extensive experiments using six real graphs demonstrate that RPCS is more than up to orders of magnitude faster than the state of the art. In particular, on a million-edge Twitter graph, RPCS answers the ϵ -approximate (ϵ= 0.1 ) all pairwise CoSimRank query within 4 h, using a single commodity server, while existing solutions fail to terminate within a day.
AB - Given a graph G with n nodes, and two nodes u, v∈ G, the CoSimRank value s(u, v) quantifies the similarity between u and v based on graph topology. Compared to SimRank, CoSimRank is shown to be more accurate and effective in many real-world applications including synonym expansion, lexicon extraction, and entity relatedness in knowledge graphs. The computation of all pairwise CoSimRanks in G is highly expensive and challenging. Existing solutions all focus on devising approximate algorithms for the computation of all pairwise CoSimRanks. To attain a desired absolute accuracy guarantee ϵ, the state-of-the-art approximate algorithm for computing all pairwise CoSimRanks requires O(n3log2(ln(1/ϵ))) time, which is prohibitively expensive even ϵ is large. In this paper, we propose RPCS, a fast randomized algorithm for computing all pairwise CoSimRank values. The basic idea of RPCS is to approximate the n× n matrix multiplications in CoSimRank computation via random projection. Theoretically, RPCS runs in O(n2ln(n)/ϵ2ln(1/ϵ)) time and meanwhile ensures an absolute error of at most ϵ in each CoSimRank value in G with a high probability. Extensive experiments using six real graphs demonstrate that RPCS is more than up to orders of magnitude faster than the state of the art. In particular, on a million-edge Twitter graph, RPCS answers the ϵ -approximate (ϵ= 0.1 ) all pairwise CoSimRank query within 4 h, using a single commodity server, while existing solutions fail to terminate within a day.
KW - Approximate algorithm
KW - CoSimRank
KW - Random projection
UR - http://www.scopus.com/inward/record.url?scp=85121919075&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-90888-1_34
DO - 10.1007/978-3-030-90888-1_34
M3 - Conference proceeding
AN - SCOPUS:85121919075
SN - 9783030908874
T3 - Lecture Notes in Computer Science
SP - 438
EP - 452
BT - Web Information Systems Engineering – WISE 2021
A2 - Zhang, Wenjie
A2 - Zou, Lei
A2 - Maamar, Zakaria
A2 - Chen, Lu
PB - Springer Cham
T2 - 22nd International Conference on Web Information Systems Engineering, WISE 2021
Y2 - 26 October 2021 through 29 October 2021
ER -