Computing the Difference of Conjunctive Queries Efficiently

Xiao Hu, Qichen Wang

Research output: Contribution to journalJournal articlepeer-review

Abstract

We investigate how to efficiently compute the difference result of two (or multiple) conjunctive queries, which is the last operator in relational algebra to be unraveled. The standard approach in practical database systems is to materialize the results for every input query as a separate set, and then compute the difference of two (or multiple) sets. This approach is bottlenecked by the complexity of evaluating every input query individually, which could be very expensive, particularly when there are only a few results in the difference. In this paper, we introduce a new approach by exploiting the structural property of input queries and rewriting the original query by pushing the difference operator down as much as possible. We show that for a large class of difference queries, this approach can lead to a linear-time algorithm, in terms of the input size and (final) output size, i.e., the number of query results that survive from the difference operator. We complete this result by showing the hardness of computing the remaining difference queries in linear time. Although a linear-time algorithm is hard to achieve in general, we also provide some heuristics that can provably improve the standard approach. At last, we compare our approach with standard SQL engines over graph and benchmark datasets. The experiment results demonstrate order-of-magnitude speedups achieved by our approach over the vanilla SQL engine.
Original languageEnglish
Article number153
Number of pages26
JournalProceedings of the ACM on Management of Data
Volume1
Issue number2
DOIs
Publication statusPublished - Jun 2023

User-Defined Keywords

  • conjunctive query
  • difference operator
  • query optimization

Fingerprint

Dive into the research topics of 'Computing the Difference of Conjunctive Queries Efficiently'. Together they form a unique fingerprint.

Cite this