TY - JOUR
T1 - TRIM: Accelerating High-Dimensional Vector Similarity Search with Enhanced Triangle-Inequality-Based Pruning
AU - Song, Yitong
AU - Zhang, Pengcheng
AU - Gao, Chao
AU - Yao, Bin
AU - Wang, Kai
AU - Wu, Zongyuan
AU - Qu, Lin
N1 - Funding information:
This work was sponsored by the Oceanic Interdisciplinary Program of Shanghai Jiao Tong University (SL2023ZD102) and Alibaba Group through Alibaba Innovative Research (AIR) Program.
Publisher copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/12/5
Y1 - 2025/12/5
N2 - High-dimensional vector similarity search (HVSS) is critical for many data processing and AI applications. However, traditional HVSS methods often require extensive data access for distance calculations, leading to inefficiencies. Triangle-inequality-based lower bound pruning is a widely used technique to reduce the number of data access in low-dimensional spaces but becomes less effective in high-dimensional settings. This is attributed to the ''distance concentration'' phenomenon, where the lower bounds derived from the triangle inequality become too small to be useful. To address this, we propose TRIM, which enhances the effectiveness of traditional triangle-inequality-based pruning in high-dimensional vector similarity search using two key ways: (1) optimizing landmark vectors used to form the triangles, and (2) relaxing the lower bounds derived from the triangle inequality, with the relaxation degree adjustable according to user's needs. TRIM is a versatile operation that can be seamlessly integrated into both memory-based (e.g., HNSW, IVFPQ) and disk-based (e.g., DiskANN) HVSS methods, reducing distance calculations and disk access. Extensive experiments show that TRIM enhances memory-based methods, improving graph-based search by up to 90% and quantization-based search by up to 200%, while achieving a pruning ratio of up to 99%. It also reduces I/O costs by up to 58% and improves efficiency by 102% for disk-based methods, while preserving high query accuracy. Our source code is available at https://github.com/petrizhang/TRIM.
AB - High-dimensional vector similarity search (HVSS) is critical for many data processing and AI applications. However, traditional HVSS methods often require extensive data access for distance calculations, leading to inefficiencies. Triangle-inequality-based lower bound pruning is a widely used technique to reduce the number of data access in low-dimensional spaces but becomes less effective in high-dimensional settings. This is attributed to the ''distance concentration'' phenomenon, where the lower bounds derived from the triangle inequality become too small to be useful. To address this, we propose TRIM, which enhances the effectiveness of traditional triangle-inequality-based pruning in high-dimensional vector similarity search using two key ways: (1) optimizing landmark vectors used to form the triangles, and (2) relaxing the lower bounds derived from the triangle inequality, with the relaxation degree adjustable according to user's needs. TRIM is a versatile operation that can be seamlessly integrated into both memory-based (e.g., HNSW, IVFPQ) and disk-based (e.g., DiskANN) HVSS methods, reducing distance calculations and disk access. Extensive experiments show that TRIM enhances memory-based methods, improving graph-based search by up to 90% and quantization-based search by up to 200%, while achieving a pruning ratio of up to 99%. It also reduces I/O costs by up to 58% and improves efficiency by 102% for disk-based methods, while preserving high query accuracy. Our source code is available at https://github.com/petrizhang/TRIM.
KW - High-dimensional vector similarity search
KW - Triangle inequality
KW - Approximate 𝑘 nearest neighbor search
KW - Approximate range search
U2 - 10.1145/3769838
DO - 10.1145/3769838
M3 - Journal article
SN - 2836-6573
VL - 3
SP - 1
EP - 26
JO - Proceedings of the ACM on Management of Data
JF - Proceedings of the ACM on Management of Data
IS - 6
M1 - 373
ER -