Abstract
Approximate Nearest Neighbor Search (ANNS) is a critical problem in vector databases. Cluster-based index is utilized to narrow the search scope of ANNS, thereby accelerating the search process. Due to its scalability, it is widely employed in real-world vector search systems. However, existing cluster-based indexes often suffer from coarse granularity, requiring query vectors to compute distances with vectors of varying quality, thus increasing query complexity. Existing work aim to represent vectors with minimal cost, such as using product quantization (PQ) or linear transformations, to speed up ANNS. However, these approaches do not address the coarse granularity inherent in cluster-based index. In this paper, we present an efficient vector data query engine to enhance the granularity of cluster-based index by carefully subdividing clusters using diverse distance metrics. Building on this refined index, we introduce techniques that leverage triangle inequalities to develop highly optimized and distinct search strategies for clusters and vectors of varying qualities, thereby reducing the overhead of ANNS. Extensive experiments demonstrate that our method significantly outperforms existing in-memory cluster-based indexing algorithms, achieving up to an impressive 10 speedup and a pruning ratio exceeding 99.4%.
Original language | English |
---|---|
Article number | 82 |
Pages (from-to) | 1-28 |
Number of pages | 28 |
Journal | Proceedings of the ACM on Management of Data |
Volume | 3 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Feb 2025 |
User-Defined Keywords
- pruning compression
- triangle inequalities
- vector data query engine