TY - GEN
T1 - Semi-supervised clustering of graph objects
T2 - 17th International Conference on Database Systems for Advanced Applications, DASFAA 2012
AU - Huang, Xin
AU - Cheng, Hong
AU - Yang, Jiong
AU - Yu, Jeffery Xu
AU - Fei, Hongliang
AU - Huan, Jun
PY - 2012/4
Y1 - 2012/4
N2 - Semi-supervised clustering has recently received a lot of attention in the literature, which aims to improve the clustering performance with limited supervision. Most existing semi-supervised clustering studies assume that the data is represented in a vector space, e.g., text and relational data. When the data objects have complex structures, e.g., proteins and chemical compounds, those semi-supervised clustering methods are not directly applicable to clustering such graph objects. In this paper, we study the problem of semi-supervised clustering of data objects which are represented as graphs. The supervision information is in the form of pairwise constraints of must-links and cannot-links. As there is no predefined feature set for the graph objects, we propose to use discriminative subgraph patterns as the features. We design an objective function which incorporates the constraints to guide the subgraph feature mining and selection process. We derive an upper bound of the objective function based on which, a branch-and-bound algorithm is proposed to speedup subgraph mining. We also introduce a redundancy measure into the feature selection process in order to reduce the redundancy in the feature set. When the graph objects are represented in the vector space of the discriminative subgraph features, we use semi-supervised kernel K-means to cluster all graph objects. Experimental results on real-world protein datasets demonstrate that the constraint information can effectively guide the feature selection and clustering process and achieve satisfactory clustering performance.
AB - Semi-supervised clustering has recently received a lot of attention in the literature, which aims to improve the clustering performance with limited supervision. Most existing semi-supervised clustering studies assume that the data is represented in a vector space, e.g., text and relational data. When the data objects have complex structures, e.g., proteins and chemical compounds, those semi-supervised clustering methods are not directly applicable to clustering such graph objects. In this paper, we study the problem of semi-supervised clustering of data objects which are represented as graphs. The supervision information is in the form of pairwise constraints of must-links and cannot-links. As there is no predefined feature set for the graph objects, we propose to use discriminative subgraph patterns as the features. We design an objective function which incorporates the constraints to guide the subgraph feature mining and selection process. We derive an upper bound of the objective function based on which, a branch-and-bound algorithm is proposed to speedup subgraph mining. We also introduce a redundancy measure into the feature selection process in order to reduce the redundancy in the feature set. When the graph objects are represented in the vector space of the discriminative subgraph features, we use semi-supervised kernel K-means to cluster all graph objects. Experimental results on real-world protein datasets demonstrate that the constraint information can effectively guide the feature selection and clustering process and achieve satisfactory clustering performance.
KW - frequent subgraph mining
KW - Semi-supervised clustering
UR - http://www.scopus.com/inward/record.url?scp=84860687656&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-29038-1_16
DO - 10.1007/978-3-642-29038-1_16
M3 - Conference proceeding
AN - SCOPUS:84860687656
SN - 9783642290374
T3 - Lecture Notes in Computer Science
SP - 197
EP - 212
BT - Database Systems for Advanced Applications
A2 - Lee, Sang-goo
A2 - Peng, Zhiyong
A2 - Zhou, Xiaofang
A2 - Moon, Yang-Sae
A2 - Unland, Rainer
A2 - Yoo, Jaesoo
Y2 - 15 April 2012 through 18 April 2012
ER -