TY - JOUR
T1 - Human-Powered Data Cleaning for Probabilistic Reachability Queries on Uncertain Graphs
AU - Lin, Xin
AU - Peng, Yun
AU - Choi, Byron
AU - Xu, Jianliang
N1 - Funding Information:
This work was supported by the National High-tech 863 Program of China (No. 2015AA015801) and HK-RGC grants 12244916, 12200114, and HKBU12201315. Xin Lin’s work was also supported by NSFC (No. 61572193) and NSF of Shanghai (No. 17ZR1444900). Yun Peng’s work was supported by NSFC (No. 61502258, 71402083) and NSF of Shandong (No. ZR2014FQ007). Yun Peng is the corresponding author.
Publisher copyright:
© 2017 IEEE
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Uncertain graph models are widely used in real-world applications such as knowledge graphs and social networks. To capture the uncertainty, each edge in an uncertain graph is associated with an existential probability that signifies the likelihood of the existence of the edge. One notable issue of querying uncertain graphs is that the results are sometimes uninformative because of the edge uncertainty. In this paper, we consider probabilistic reachability queries, which are one of the fundamental classes of graph queries. To make the results more informative, we adopt a crowdsourcing-based approach to clean the uncertain edges. However, considering the time and monetary cost of crowdsourcing, it is a problem to efficiently select a limited set of edges for cleaning that maximizes the quality improvement. We prove that the edge selection problem is #P-hard. In light of the hardness of the problem, we propose a series of edge selection algorithms, followed by a number of optimization techniques and pruning heuristics for reducing the computation time. Our experimental results demonstrate that our proposed techniques outperform a random selection by up to 27 times in terms of the result quality improvement and the brute-force solution by up to 60 times in terms of the elapsed time.
AB - Uncertain graph models are widely used in real-world applications such as knowledge graphs and social networks. To capture the uncertainty, each edge in an uncertain graph is associated with an existential probability that signifies the likelihood of the existence of the edge. One notable issue of querying uncertain graphs is that the results are sometimes uninformative because of the edge uncertainty. In this paper, we consider probabilistic reachability queries, which are one of the fundamental classes of graph queries. To make the results more informative, we adopt a crowdsourcing-based approach to clean the uncertain edges. However, considering the time and monetary cost of crowdsourcing, it is a problem to efficiently select a limited set of edges for cleaning that maximizes the quality improvement. We prove that the edge selection problem is #P-hard. In light of the hardness of the problem, we propose a series of edge selection algorithms, followed by a number of optimization techniques and pruning heuristics for reducing the computation time. Our experimental results demonstrate that our proposed techniques outperform a random selection by up to 27 times in terms of the result quality improvement and the brute-force solution by up to 60 times in terms of the elapsed time.
UR - http://www.scopus.com/inward/record.url?scp=85020284092&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2017.2684166
DO - 10.1109/TKDE.2017.2684166
M3 - Journal article
AN - SCOPUS:85020284092
SN - 1041-4347
VL - 29
SP - 1452
EP - 1465
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 7
ER -