TY - JOUR
T1 - Keys for graphs
AU - Fan, Wenfei
AU - Fan, Zhe
AU - Tian, Chao
AU - Dong, Xin Luna
N1 - Fan is supported in part by NSFC 61133002, 973 Program 2012CB316200 and 2014CB340302, ERC-2014-AdG 652976, Guangdong Innovative Research Team Program 2011D005, Shenzhen Peacock Program 1105100030834361, EPSRC EP/J015377/1 and EP/M025268/1, NSF III 1302212, and a Google Faculty Research Award.
PY - 2015/8
Y1 - 2015/8
N2 - Keys for graphs aim to uniquely identify entities represented by vertices in a graph. We propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. Extending conventional keys for relations and XML, these keys find applications in object identification, knowledge fusion and social network reconciliation. As an application, we study the entity matching problem that, given a graph G and a set ∑ of keys, is to find all pairs of entities (vertices) in G that are identified by keys in ∑. We show that the problem is intractable, and cannot be parallelized in logarithmic rounds. Nonetheless, we provide two parallel scalable algorithms for entity matching, in MapReduce and a vertex-centric asynchronous model. Using real-life and synthetic data, we experimentally verify the effectiveness and scalability of the algorithms.
AB - Keys for graphs aim to uniquely identify entities represented by vertices in a graph. We propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. Extending conventional keys for relations and XML, these keys find applications in object identification, knowledge fusion and social network reconciliation. As an application, we study the entity matching problem that, given a graph G and a set ∑ of keys, is to find all pairs of entities (vertices) in G that are identified by keys in ∑. We show that the problem is intractable, and cannot be parallelized in logarithmic rounds. Nonetheless, we provide two parallel scalable algorithms for entity matching, in MapReduce and a vertex-centric asynchronous model. Using real-life and synthetic data, we experimentally verify the effectiveness and scalability of the algorithms.
UR - http://www.scopus.com/inward/record.url?scp=84953864096&partnerID=8YFLogxK
U2 - 10.14778/2824032.2824056
DO - 10.14778/2824032.2824056
M3 - Conference article
AN - SCOPUS:84953864096
SN - 2150-8097
VL - 8
SP - 1590
EP - 1601
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 12
T2 - 41st International Conference on Very Large Data Bases, VLDB 2015
Y2 - 31 August 2015 through 4 September 2015
ER -