LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases

Yun Peng, Byron Koon Kau Choi, Tsz Nam Chan, Jianliang Xu

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

5 Citations (Scopus)

Abstract

The problem of k-nearest neighbor (k-NN) search is fundamental in graph databases, which has numerous real-world applications, such as bioinformatics, computer vision, and software engineering. Graph edit distance (GED) and maximum common subgraph (MCS)-based distance are the most widely used distance measures in k-NN search. However, computing the exact k-NNs of a query graph Q using these measures is prohibitively time-consuming, as a large number of graph distance computations is needed, and computing GED and MCS are both NP-hard. In this paper, we study the approximate k-nearest neighbor (k-ANN) search with the aim of trading efficiency with a slight decrease in accuracy. Greedy routing on the proximity graph (PG) index is a state-of-the-art method for k-ANN search. However, such routing algorithms are not designed for graph databases, and simple adoption is inefficient. The core reason is that the exhaustive neighbor exploration at each routing step incurs a large number of distance computations (NDC). In this paper, we propose a learning-based k-ANN search method to reduce NDC. First, we propose to prune unpromising neighbors from distance computations. We use a graph learning model to rank the neighbors at each routing step and explore only the top neighbors. For the accuracy of rank prediction, we propose a neighbor ranking model that works only in the neighborhood of Q. Second, we propose a learning-based method to select the initial node for the routing. The initial node selected has a high probability of being in the neighborhood of Q, such that the neighbor ranking model can be used. Third, we propose a compressed GNN-graph to accelerate the neighbor ranking model and the initial node selection model. We prove that learning efficiency is improved without degrading the accuracy. Our extensive experiments show that our method is about 3.6x to 18.6x faster than the state-of-the-art methods on real-world datasets.
Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PublisherIEEE
Pages2508-2521
Number of pages14
ISBN (Electronic)9781665408837
ISBN (Print)9781665408844
DOIs
Publication statusPublished - May 2022
Event38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Kuala Lumpur, Malaysia
Duration: 9 May 202212 May 2022
https://icde2022.ieeecomputer.my/
https://ieeexplore.ieee.org/xpl/conhome/9835153/proceeding

Publication series

NameProceedings of IEEE International Conference on Data Engineering (ICDE)
ISSN (Print)1063-6382
ISSN (Electronic)2375-026X

Conference

Conference38th IEEE International Conference on Data Engineering, ICDE 2022
Country/TerritoryMalaysia
CityKuala Lumpur
Period9/05/2212/05/22
Internet address

Scopus Subject Areas

  • Software
  • Information Systems
  • Signal Processing

User-Defined Keywords

  • Approximate k-NN search
  • GNN acceleration
  • Graph database
  • Learning to route
  • Proximity graph

Fingerprint

Dive into the research topics of 'LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases'. Together they form a unique fingerprint.

Cite this