The proliferation of rich information available for real world entities and their relationships gives rise to a general type of graph, namely multi-valued attributed graph, where graph vertices are associated with a number of attributes and a vertex may have multiple values on an attribute. It would be useful if we can cluster such graphs into densely connected components with homogeneous attribute values. Recent work has studied graph clustering in attributed graphs considering the full attribute space. However, full space clustering often leads to poor results due to irrelevant attributes.
In this paper, we study subspace clustering in multi-valued attributed graph and propose an algorithm SCMAG for community detection. Our algorithm uses a cell-based subspace clustering approach and identifies cells with dense connectivity in the subspaces. Random walk with restart is used to measure the structural connectivity and attribute similarity. An indexing scheme is designed to support efficiently calculating cell connectivity from random walk scores. We also propose a new cell combining strategy on dimensions of categorical attributes and a novel mechanism to handle multi-valued attributes. Experimental results on IMDB data and bibliographic data demonstrate that SCMAG significantly outperforms the state-of-the-art subspace clustering algorithm and attributed graph clustering algorithm. Case studies show SCMAG can find dense communities with homogeneous properties under different subspaces.
Scopus Subject Areas
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence
- Attributed graph
- Community detection
- Dense subgraph
- Random walk with restart