Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle Inequalities

Qian Xu, Juan Yang, Feng Zhang, Junda Pan, Kang Chen, Youren Shen, Amelie Chi Zhou, Xiaoyong Du

Research output: Contribution to journalJournal articlepeer-review

Abstract

Approximate Nearest Neighbor Search (ANNS) is a critical problem in vector databases. Cluster-based index is utilized to narrow the search scope of ANNS, thereby accelerating the search process. Due to its scalability, it is widely employed in real-world vector search systems. However, existing cluster-based indexes often suffer from coarse granularity, requiring query vectors to compute distances with vectors of varying quality, thus increasing query complexity. Existing work aim to represent vectors with minimal cost, such as using product quantization (PQ) or linear transformations, to speed up ANNS. However, these approaches do not address the coarse granularity inherent in cluster-based index. In this paper, we present an efficient vector data query engine to enhance the granularity of cluster-based index by carefully subdividing clusters using diverse distance metrics. Building on this refined index, we introduce techniques that leverage triangle inequalities to develop highly optimized and distinct search strategies for clusters and vectors of varying qualities, thereby reducing the overhead of ANNS. Extensive experiments demonstrate that our method significantly outperforms existing in-memory cluster-based indexing algorithms, achieving up to an impressive 10 speedup and a pruning ratio exceeding 99.4%.
Original languageEnglish
Article number82
Pages (from-to)1-28
Number of pages28
JournalProceedings of the ACM on Management of Data
Volume3
Issue number1
DOIs
Publication statusPublished - 1 Feb 2025

User-Defined Keywords

  • pruning compression
  • triangle inequalities
  • vector data query engine

Fingerprint

Dive into the research topics of 'Tribase: A Vector Data Query Engine for Reliable and Lossless Pruning Compression using Triangle Inequalities'. Together they form a unique fingerprint.

Cite this