Block Laplacian Matrices, Spectral Clustering and Eigendecomposition

Project: Research project

Project Details

Description

In data science, data is often represented by using an undirected graph with vertices and edges. Here a vertex represents an entity, and an edge describes a relationship between two entities. In many real world applications, there can be many relations arising from different sources and/or different types of models. For instance, multiple graphs can be studied by considering gene expression data under different diseases in bioinformatics or users connec- tion data under different media in social networks. In this proposal, we are interested in clustering of multiple graphs over the same set of vertices where the edges describe differ- ent relations between entities. This kind of multiple relations graphs can provide a better way of understanding, classifying and grouping objects for analysis and applications. Ex- isting methods involve variational Bayesian estimation techniques, tensor computation or optimization techniques for matrix factorization. These techniques may be computational expensive or/and these methods assume a fixed cluster structure for all the graphs.

In this project, we would like to develop efficient and effective solution methods for clus- tering of multiple graphs. Our idea is to construct block Laplacian matrices for multiple graphs. The main diagonal blocks are just Laplacian matrices from each graph. Because we deal with the same set of vertices, its off-diagonal blocks are set to be the minus iden- tity matrix. We can identify both the clusters in each graph and determine the consistent clusters in multiple graphs. We will show that the multiplicity of the zero eigenvalue of the constructed block Laplacian matrix is equal to the number of connected components in multiple graphs. Eigenvectors of the constructed block Laplacian matrix can be computed efficiently and effectively used for clustering of vertices. We will show that eigenvectors are the solutions of the relaxation of multiple graphs cut problems. The lower and upper bounds of the optimal values of such multiple graphs cut problems can be established so that the performance of the proposed block spectral clustering method can be evaluated. Also we propose and develop the framework for normalized block Laplacian matrices for multiple graphs. Theoretical results and properties of normalized block Laplacian matrices will be investigated and analyzed.

On the other hand, inter-relations among multiple graphs can be employed, i.e., the link- age between two vertices in two different graphs. For instance, in social network, a user may share a picture in “Facebook” to another user in “Instagram”, or vice versa. Inter-relations would be very useful to discover cluster structure in multiple graphs. Correspondingly, block Laplacian and block normalized Laplacian matrices can be constructed and their theoretical results (the multiplicity of the zero eigenvalue, the relaxation of multiple graphs cut problem, and the lower and upper bounds of their optimal values) can be studied and analyzed. To the best of our knowledge, our proposed work is the first attempt to address clustering of this kind of multiple graphs.

In the proposal, we will employ synthetic and real data sets (gene expression data, pub- lication data social network data and World Wide Web data) to test the clustering and computational performance of the proposed block spectral clustering algorithms for multiple graphs, and compare it with other existing methods. By using these testing data sets, we will study and evaluate further optimization methods for block spectral clustering so that subspace solutions can be correlated maximally and clusters in each graph can be consistent optimally across different graphs.
StatusFinished
Effective start/end date1/01/1631/12/18

UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):

  • SDG 3 - Good Health and Well-being

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.