## Abstract

Graph edit distance (GED) computation is a fundamental NP-hard problem in graph theory. Given a graph pair (G1, G2), GED is defined as the minimum number of primitive operations converting G1 to G2. Early studies focus on search-based inexact algorithms such as A*-beam search, and greedy algorithms using bipartite matching due to its NP-hardness. They can obtain a sub-optimal solution by constructing an edit path (the sequence of operations that converts G1 to G2). Recent studies convert the GED between a given graph pair (G1, G2) into a similarity score in the range (0, 1) by a well designed function. Then machine learning models (mostly based on graph neural networks) are applied to predict the similarity score. They achieve a much higher numerical precision than the sub-optimal solutions found by classical algorithms. However, a major limitation is that these machine learning models cannot generate an edit path. They treat the GED computation as a pure regression task to bypass its intrinsic complexity, but ignore the essential task of converting G1 to G2. This severely limits the interpretability and usability of the solution.

In this paper, we propose a novel deep learning framework that solves the GED problem in a two-step manner: 1) The proposed graph neural network GEDGNN is in charge of predicting the GED value and a matching matrix; and 2) A post-processing algorithm based on k-best matching is used to derive k possible node matchings from the matching matrix generated by GEDGNN. The best matching will finally lead to a high-quality edit path. Extensive experiments are conducted on three real graph data sets and synthetic power-law graphs to demonstrate the effectiveness of our framework. Compared to the best result of existing GNN-based models, the mean absolute error (MAE) on GED value prediction decreases by 4.9% ~ 74.3%. Compared to the state-of-the-art searching algorithm Noah, the MAE on GED value based on edit path reduces by 53.6% ~ 88.1%.

In this paper, we propose a novel deep learning framework that solves the GED problem in a two-step manner: 1) The proposed graph neural network GEDGNN is in charge of predicting the GED value and a matching matrix; and 2) A post-processing algorithm based on k-best matching is used to derive k possible node matchings from the matching matrix generated by GEDGNN. The best matching will finally lead to a high-quality edit path. Extensive experiments are conducted on three real graph data sets and synthetic power-law graphs to demonstrate the effectiveness of our framework. Compared to the best result of existing GNN-based models, the mean absolute error (MAE) on GED value prediction decreases by 4.9% ~ 74.3%. Compared to the state-of-the-art searching algorithm Noah, the MAE on GED value based on edit path reduces by 53.6% ~ 88.1%.

Original language | English |
---|---|

Pages (from-to) | 1817-1829 |

Number of pages | 13 |

Journal | Proceedings of the VLDB Endowment |

Volume | 16 |

Issue number | 8 |

DOIs | |

Publication status | Published - 1 Apr 2023 |

Event | 49th International Conference on Very Large Data Bases, VLDB 2023 - Sheraton Vancouver Wall Centre, Vancouver, Canada Duration: 28 Aug 2023 → 1 Sept 2023 https://vldb.org/2023/ (Link to conference website) https://vldb.org/2023/?program-structure (Link to conference programme) |