Vector Similarity Search in High-Dimensional Spaces

Project: Research project

Project Details

Description

Vector similarity search plays a crucial role in various data science and AI applications, such as natural language understanding, recommendation systems, image/video processing, and anomaly detection, among others. Particularly with the emergence of ChatGPT and other generative AI technologies, the demand for vector similarity search in high-dimensional spaces has intensified. However, conducting efficient and accurate similarity search in highdimensional vector spaces is challenging due to the curse of dimensionality.

Despite recent advancements, existing vector similarity search methods have several limitations. Firstly, state-of-the-art techniques suffer from excessively large search time complexities, with a factor of 2m (where m is the dimensionality of vectors), making them inefficient for practical applications. Moreover, methods for vector similarity search in multimetric spaces often lack accuracy guarantees or suffer from inefficiencies. Additionally, current approaches are memory-intensive and I/O-unfriendly, resulting in significant costs when dealing with large-scale databases that are stored on external storage.

To address these limitations, this research proposal aims to investigate novel approaches to vector similarity search in high-dimensional vector spaces. Specifically, we propose four research tasks to tackle the research questions. First, we aim to reduce the search time complexity by developing a novel ball-cover-based proximity graph (PG) for indexing vector data and designing scalable PGs with respect to different data expansion rates. Second, we will propose a new margin-allowed proximity graph (MPG) framework to efficiently support multi-vector similarity search while ensuring accurate search results. Third, we will explore a specifically designed hierarchical PG framework to minimize I/O costs and enable efficient vector similarity search in large-scale databases. Finally, comprehensive theoretical analysis and empirical studies will be conducted to evaluate the proposed techniques and algorithms.

With our extensive experience in query processing and data management, this research proposal is expected to advance the study of vector similarity search and contribute to the development of more efficient and accurate techniques for handling high-dimensional vectorized data, ultimately benefiting the broader data science and AI community.
StatusNot started
Effective start/end date1/01/2531/12/27

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.