TY - JOUR
T1 - Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles
AU - Yu, Zhiwen
AU - Chen, Hongsheng
AU - You, Jane
AU - Wong, Hau San
AU - Liu, Jiming
AU - Li, Le
AU - Han, Guoqiang
N1 - The work described in this paper was partially funded by the grant from the Hong Kong Scholars Program (Project no. XJ2012015) and supported by grants from the National Natural Science Foundation of China (NSFC) (Project nos. 61273363, 61379033), the NSFC Guangdong Joint Fund (Project Nos. U1035004), the Guangdong Natural Science Funds for Distinguished Young Scholar (Project No. S2013050014677), a grant from Science and Technology Planning Project of Guangzhou (Project No. 11A11080267), a grant from China Postdoctoral Science Foundation (Project No. 2013M540655), a grant from Foundation of Guangdong Educational Committee (Project No. 2012KJCX0011), a grant from the Fundamental Research Funds for the Central Universities (Project No. 2014G0007), a grant from Key Enterprises and Innovation Organizations in Nanshan District in Shenzhen (Project No. KC2013ZDZJ0007A), a grant from Natural Science Foundation of Guangdong Province, China (Project No. S2012010009961), a grant from the Doctoral Program of Higher Education (Project No. 20110172120027), a grant from the Cooperation Project in Industry, Education and Academy of Guangdong Province and Ministry of Education of China (Project No. 2011B090400032), a grant from the key lab of cloud computing and big data in Guangzhou (Project No. SITGZ [2013]268-6), a grant from the Hong Kong Baptist University (Project no. RGC/HKBU211212), a grant from the City University of Hong Kong (Project No. 7004047) and the grants from the Hong Kong Polytechnic University (G-YK77 and G-YK53).
PY - 2014/7
Y1 - 2014/7
N2 - Tumor clustering is one of the important techniques for tumor discovery from cancer gene expression profiles, which is useful for the diagnosis and treatment of cancer. While different algorithms have been proposed for tumor clustering, few make use of the expert's knowledge to better the performance of tumor discovery. In this paper, we first view the expert's knowledge as constraints in the process of clustering, and propose a feature selection based semi-supervised cluster ensemble framework (FS-SSCE) for tumor clustering from bio-molecular data. Compared with traditional tumor clustering approaches, the proposed framework FS-SSCE is featured by two properties: (1) The adoption of feature selection techniques to dispel the effect of noisy genes. (2) The employment of the binate constraint based K-means algorithm to take into account the effect of experts' knowledge. Then, a double selection based semi-supervised cluster ensemble framework (DS-SSCE) which not only applies the feature selection technique to perform gene selection on the gene dimension, but also selects an optimal subset of representative clustering solutions in the ensemble and improve the performance of tumor clustering using the normalized cut algorithm. DS-SSCE also introduces a confidence factor into the process of constructing the consensus matrix by considering the prior knowledge of the data set. Finally, we design a modified double selection based semi-supervised cluster ensemble framework (MDS-SSCE) which adopts multiple clustering solution selection strategies and an aggregated solution selection function to choose an optimal subset of clustering solutions. The results in the experiments on cancer gene expression profiles show that (i) FS-SSCE, DS-SSCE and MDS-SSCE are suitable for performing tumor clustering from bio-molecular data. (ii) MDS-SSCE outperforms a number of state-of-the-art tumor clustering approaches on most of the data sets.
AB - Tumor clustering is one of the important techniques for tumor discovery from cancer gene expression profiles, which is useful for the diagnosis and treatment of cancer. While different algorithms have been proposed for tumor clustering, few make use of the expert's knowledge to better the performance of tumor discovery. In this paper, we first view the expert's knowledge as constraints in the process of clustering, and propose a feature selection based semi-supervised cluster ensemble framework (FS-SSCE) for tumor clustering from bio-molecular data. Compared with traditional tumor clustering approaches, the proposed framework FS-SSCE is featured by two properties: (1) The adoption of feature selection techniques to dispel the effect of noisy genes. (2) The employment of the binate constraint based K-means algorithm to take into account the effect of experts' knowledge. Then, a double selection based semi-supervised cluster ensemble framework (DS-SSCE) which not only applies the feature selection technique to perform gene selection on the gene dimension, but also selects an optimal subset of representative clustering solutions in the ensemble and improve the performance of tumor clustering using the normalized cut algorithm. DS-SSCE also introduces a confidence factor into the process of constructing the consensus matrix by considering the prior knowledge of the data set. Finally, we design a modified double selection based semi-supervised cluster ensemble framework (MDS-SSCE) which adopts multiple clustering solution selection strategies and an aggregated solution selection function to choose an optimal subset of clustering solutions. The results in the experiments on cancer gene expression profiles show that (i) FS-SSCE, DS-SSCE and MDS-SSCE are suitable for performing tumor clustering from bio-molecular data. (ii) MDS-SSCE outperforms a number of state-of-the-art tumor clustering approaches on most of the data sets.
KW - Cluster ensemble
KW - Feature selection
KW - Gene expression profiles
KW - Semi-supervised clustering
KW - Tumor clustering
UR - http://www.scopus.com/inward/record.url?scp=84930848543&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2014.2315996
DO - 10.1109/TCBB.2014.2315996
M3 - Journal article
C2 - 26356343
AN - SCOPUS:84930848543
SN - 1545-5963
VL - 11
SP - 727
EP - 740
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 4
M1 - 6783979
ER -