TY - GEN
T1 - K-selection query over uncertain data
AU - Liu, Xingjie
AU - Ye, Mao
AU - XU, Jianliang
AU - Tian, Yuan
AU - Lee, Wang Chien
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a preference score and a presence probability, a k-selection query returns k objects such that the expected score of the "best available" objects is maximized. This query is useful in many application domains such as entity web search and decision making. In evaluating k-selection queries, we need to compute the expected best score (EBS) for candidate k-selection sets and search for the optimal selection set with the highest EBS. Those operations are costly due to the extremely large search space. In this paper, we identify several important properties of k-selection queries, including EBS decomposition, query recursion, and EBS bounding. Based upon these properties, we first present a dynamic programming (DP) algorithm that answers the query in O(k · N) time. Further, we propose a Bounding-and-Pruning (BP) algorithm, that exploits effective search space pruning strategies to find the optimal selection without accessing all objects. We evaluate the DP and BP algorithms using both synthetic and real data. The results show that the proposed algorithms outperform the baseline approach by several orders of magnitude.
AB - This paper studies a new query on uncertain data, called k-selection query. Given an uncertain dataset of N objects, where each object is associated with a preference score and a presence probability, a k-selection query returns k objects such that the expected score of the "best available" objects is maximized. This query is useful in many application domains such as entity web search and decision making. In evaluating k-selection queries, we need to compute the expected best score (EBS) for candidate k-selection sets and search for the optimal selection set with the highest EBS. Those operations are costly due to the extremely large search space. In this paper, we identify several important properties of k-selection queries, including EBS decomposition, query recursion, and EBS bounding. Based upon these properties, we first present a dynamic programming (DP) algorithm that answers the query in O(k · N) time. Further, we propose a Bounding-and-Pruning (BP) algorithm, that exploits effective search space pruning strategies to find the optimal selection without accessing all objects. We evaluate the DP and BP algorithms using both synthetic and real data. The results show that the proposed algorithms outperform the baseline approach by several orders of magnitude.
UR - http://www.scopus.com/inward/record.url?scp=78650478588&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-12026-8_34
DO - 10.1007/978-3-642-12026-8_34
M3 - Conference proceeding
AN - SCOPUS:78650478588
SN - 3642120253
SN - 9783642120251
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 444
EP - 459
BT - Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings
T2 - 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010
Y2 - 1 April 2010 through 4 April 2010
ER -