Exploring the Structural Critical Entities in Big Attributed Graphs

  • ZHANG, Zhiwei (PI)

Project: Research project

Project Details


The graph model has been widely used to model complex relationships among entities. A road network can be considered as a graph, where road intersections are nodes and road segments are edges; a social network (Facebook, Twitter, Flickr, etc.) can be modeled as a graph, where people are nodes and their relationships are edges; using a node to represent each webpage, a web graph can be constructed according to the links between pages. These graphs are large and growing rapidly. Also, they contain rich textual information. For example, according to the statistics, in 2016, Facebook had more than 1.7 billion active users with labels or tags for them. Moreover, the rate of increase of active users from 2015 to 2016 was 15%. Existing works on cohesive subgraph queries, aim to find not only the vertices containing certain keywords, but also the connections between these vertices. As the graph is big, the results for these queries are often large, even i f they have restricted that the result and subgraphs should be dense and cohesive. For example, considering the DBLP data set which contains authors, publications and their citation relationships, a cohesive subgraph query with the keywords “data mining” returns a subgraph with more than 3,000 vertices. Identifying critical entities in such large graph is a challenging task for users, especially considering the situation that the users lack the knowledge about the underlying data graph.

In this project, we focus on the problems of finding the structural critical entities of queries in attributed graphs. The critical entities could be vertices, keywords or subgraph patterns in our project. To address the issues of previous studies in tackling the problems, we propose a new framework based on cohesive structural exploration and cohesive subgraph collapse. Our methodologies focus on three aspects. The first is that, given a graph query, different semantics may lead to different evaluations for critical entities. The critical vertices, keywords and subgraph patterns should be evaluated based on the whole cohesive structures. Second, the critical entities should be closely related to the graph patterns, even if they belong to the same cohesive subgraphs. Third, we optimize the computational cost for higher efficiency. Specifically, we leverage graph label distributions and indexing approaches to reduce redundant computation and improve the time efficiency.

The novelty of this project lies in its exploration of structural critical entities. We will concentrate on how to identify critical vertices, keywords or subgraph patterns, how to compute the critical entities efficiently, and how to integrate all the critical queries into a unified graph system. We will conduct extensive performance studies using real world data in our prototype system, demonstrate the strengths of our proposed approaches.
Effective start/end date1/01/1931/12/21


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.