Abstract
The problem of k-nearest neighbor (k-NN) search is fundamental in graph databases, which has numerous real-world applications, such as bioinformatics, computer vision, and software engineering. Graph edit distance (GED) and maximum common subgraph (MCS)-based distance are the most widely used distance measures in k-NN search. However, computing the exact k-NNs of a query graph Q using these measures is prohibitively time-consuming, as a large number of graph distance computations is needed, and computing GED and MCS are both NP-hard. In this paper, we study the approximate k-nearest neighbor (k-ANN) search with the aim of trading efficiency with a slight decrease in accuracy. Greedy routing on the proximity graph (PG) index is a state-of-the-art method for k-ANN search. However, such routing algorithms are not designed for graph databases, and simple adoption is inefficient. The core reason is that the exhaustive neighbor exploration at each routing step incurs a large number of distance computations (NDC). In this paper, we propose a learning-based k-ANN search method to reduce NDC. First, we propose to prune unpromising neighbors from distance computations. We use a graph learning model to rank the neighbors at each routing step and explore only the top neighbors. For the accuracy of rank prediction, we propose a neighbor ranking model that works only in the neighborhood of Q. Second, we propose a learning-based method to select the initial node for the routing. The initial node selected has a high probability of being in the neighborhood of Q, such that the neighbor ranking model can be used. Third, we propose a compressed GNN-graph to accelerate the neighbor ranking model and the initial node selection model. We prove that learning efficiency is improved without degrading the accuracy. Our extensive experiments show that our method is about 3.6x to 18.6x faster than the state-of-the-art methods on real-world datasets.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022 |
Publisher | IEEE |
Pages | 2508-2521 |
Number of pages | 14 |
ISBN (Electronic) | 9781665408837 |
ISBN (Print) | 9781665408844 |
DOIs | |
Publication status | Published - May 2022 |
Event | 38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Kuala Lumpur, Malaysia Duration: 9 May 2022 → 12 May 2022 https://icde2022.ieeecomputer.my/ https://ieeexplore.ieee.org/xpl/conhome/9835153/proceeding |
Publication series
Name | Proceedings of IEEE International Conference on Data Engineering (ICDE) |
---|---|
ISSN (Print) | 1063-6382 |
ISSN (Electronic) | 2375-026X |
Conference
Conference | 38th IEEE International Conference on Data Engineering, ICDE 2022 |
---|---|
Country/Territory | Malaysia |
City | Kuala Lumpur |
Period | 9/05/22 → 12/05/22 |
Internet address |
Scopus Subject Areas
- Software
- Information Systems
- Signal Processing
User-Defined Keywords
- Approximate k-NN search
- GNN acceleration
- Graph database
- Learning to route
- Proximity graph