TY - GEN
T1 - Maintaining Acyclic Foreign-Key Joins under Updates
AU - Wang, Qichen
AU - Yi, Ke
N1 - Funding Information:
This work has been supported by HKRGC under grants 16202317, 16201318, and 16201819, as well as a grant from Alibaba Group. We would also like to thank the anonymous reviewers who have made valuable suggestions on improving the presentation of the paper.
Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/6/11
Y1 - 2020/6/11
N2 - A large number of analytical queries (e.g., all the 22 queries in the TPC-H benchmark) are based on acyclic foreign-key joins. In this paper, we study the problem of incrementally maintaining the query results of these joins under updates, i.e., insertion and deletion of tuples to any of the relations. Prior work has shown that this problem is inherently hard, requiring at least ω(|db|1/2 -ϵ) time per update, where |db| is the size of the database, and ϵ > 0 can be any small constant. However, this negative result holds only on adversarially constructed update sequences; on the other hand, most real-world update sequences are "nice", nowhere near these worst-case scenarios. We introduce a measure λ, which we call the enclosureness of the update sequence, to more precisely characterize its intrinsic difficulty. We present an algorithm to maintain the query results of any acyclic foreign-key join in O(λ) time amortized, on any update sequence whose enclosureness is λ. This is complemented with a lower bound of ω(λ1-ϵ), showing that our algorithm is essentially optimal with respect to λ. Next, using this algorithm as the core component, we show how all the 22 queries in the TPC-H benchmark can be supported in ∼O(łambda) time. Finally, based on the algorithms developed, we built a continuous query processing system on top of Flink, and experimental results show that our system outperforms previous ones significantly.
AB - A large number of analytical queries (e.g., all the 22 queries in the TPC-H benchmark) are based on acyclic foreign-key joins. In this paper, we study the problem of incrementally maintaining the query results of these joins under updates, i.e., insertion and deletion of tuples to any of the relations. Prior work has shown that this problem is inherently hard, requiring at least ω(|db|1/2 -ϵ) time per update, where |db| is the size of the database, and ϵ > 0 can be any small constant. However, this negative result holds only on adversarially constructed update sequences; on the other hand, most real-world update sequences are "nice", nowhere near these worst-case scenarios. We introduce a measure λ, which we call the enclosureness of the update sequence, to more precisely characterize its intrinsic difficulty. We present an algorithm to maintain the query results of any acyclic foreign-key join in O(λ) time amortized, on any update sequence whose enclosureness is λ. This is complemented with a lower bound of ω(λ1-ϵ), showing that our algorithm is essentially optimal with respect to λ. Next, using this algorithm as the core component, we show how all the 22 queries in the TPC-H benchmark can be supported in ∼O(łambda) time. Finally, based on the algorithms developed, we built a continuous query processing system on top of Flink, and experimental results show that our system outperforms previous ones significantly.
KW - acyclic joins
KW - incremental view maintenance
KW - query evaluation under updates
KW - sliding windows
UR - http://www.scopus.com/inward/record.url?scp=85086262806&partnerID=8YFLogxK
U2 - 10.1145/3318464.3380586
DO - 10.1145/3318464.3380586
M3 - Conference contribution
AN - SCOPUS:85086262806
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1225
EP - 1239
BT - SIGMOD 2020 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery (ACM)
T2 - ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
Y2 - 14 June 2020 through 19 June 2020
ER -