TY - JOUR
T1 - Reducing Uncertainty of Probabilistic Top-k Ranking via Pairwise Crowdsourcing
AU - Lin, Xin
AU - Xu, Jianliang
AU - Hu, Haibo
AU - Fan, Zhe
N1 - Funding Information:
This work was supported by the National Natural Science Foundation of China (Grant No.: 61572413, 61572193, 61502258 and U1636205), NSF of Shanghai (No.: 17ZR1444900), and Research Grants Council, Hong Kong SAR, China, under projects 12244916, 12201615, 12202414, 12200914, 15238116, and C1008-16G.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/10/1
Y1 - 2017/10/1
N2 - Probabilistic top-k ranking is an important and well-studied query operator in uncertain databases. However, the quality of top-k results might be heavily affected by the ambiguity and uncertainty of the underlying data. Uncertainty reduction techniques have been proposed to improve the quality of top-k results by cleaning the original data. Unfortunately, most data cleaning models aim to probe the exact values of the objects individually and therefore do not work well for subjective data types, such as user ratings, which are inherently probabilistic. In this paper, we propose a novel pairwise crowdsourcing model to reduce the uncertainty of top-k ranking using a crowd of domain experts. Given a crowdsourcing task of limited budget, we propose efficient algorithms to select the best object pairs for crowdsourcing that will bring in the highest quality improvement. Extensive experiments show that our proposed solutions outperform a random selection method by up to 30 times in terms of quality improvement of probabilistic top-k ranking queries. In terms of efficiency, our proposed solutions can reduce the elapsed time of a brute-force algorithm from several days to one minute.
AB - Probabilistic top-k ranking is an important and well-studied query operator in uncertain databases. However, the quality of top-k results might be heavily affected by the ambiguity and uncertainty of the underlying data. Uncertainty reduction techniques have been proposed to improve the quality of top-k results by cleaning the original data. Unfortunately, most data cleaning models aim to probe the exact values of the objects individually and therefore do not work well for subjective data types, such as user ratings, which are inherently probabilistic. In this paper, we propose a novel pairwise crowdsourcing model to reduce the uncertainty of top-k ranking using a crowd of domain experts. Given a crowdsourcing task of limited budget, we propose efficient algorithms to select the best object pairs for crowdsourcing that will bring in the highest quality improvement. Extensive experiments show that our proposed solutions outperform a random selection method by up to 30 times in terms of quality improvement of probabilistic top-k ranking queries. In terms of efficiency, our proposed solutions can reduce the elapsed time of a brute-force algorithm from several days to one minute.
KW - Crowdsourcing
KW - Top-k ranking
KW - Uncertain data management
UR - http://www.scopus.com/inward/record.url?scp=85021818313&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2017.2717830
DO - 10.1109/TKDE.2017.2717830
M3 - Journal article
AN - SCOPUS:85021818313
SN - 1041-4347
VL - 29
SP - 2290
EP - 2303
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 10
ER -