TY - JOUR
T1 - Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data
AU - Yu, Zhiwen
AU - Chen, Hantao
AU - You, Jane
AU - LIU, Jiming
AU - Wong, Hau San
AU - Han, Guoqiang
AU - Li, Le
N1 - The work described in this paper was partially funded by the grant from the Hong Kong Scholars Program (Project no. XJ2012015) and supported by grants from the National Natural Science Foundation of China (NSFC) (Project nos. 61273363, 61379033, 61472145), the NSFC-Guangdong Joint Fund (Project Nos. U1035004), the New Century Excellent Talents in University (Project No. NCET-11-0165), the Guangdong Natural Science Funds for Distinguished Young Scholar (Project No. S2013050014677), a grant from
Science and Technology Planning Project of Guangzhou (Project No. 11A11080267), a grant from China Postdoctoral Science Foundation (Project No. 2013M540655), a grant from Foundation of Guangdong Educational Committee (Project No. 2012KJCX0011), a grant from the Fundamental Research Funds for the Central Universities (Project No. 2014G0007), a grant from Key Enterprises and Innovation Organizations in Nanshan District in Shenzhen (Project No. KC2013ZDZJ0007A), a grant from Natural Science Foundation of Guangdong Province, China (Project No. S2012010009961), a grant from the Doctoral Program of Higher Education (Project No. 20110172120027), a grant from the Cooperation Project in Industry, Education and Academy of Guangdong Province and Ministry of Education of China (Project No. 2011B090400032), a grant from the key lab of cloud computing and big data in Guangzhou (Project No. SITGZ [2013] 268-6), a grant from the Hong Kong Baptist University (Project no. RGC/HKBU211212), a grant from the City University of Hong Kong (Project No. 7004047) and the grants from the Hong Kong Polytechnic University (G-YK53 and G-YK77).
PY - 2015/7/1
Y1 - 2015/7/1
N2 - Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms.
AB - Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms.
KW - adaptive process
KW - cancer
KW - Cluster ensemble
KW - Clustering analysis
KW - feature selection
KW - gene expression profiles
KW - microarray
KW - optimization
UR - http://www.scopus.com/inward/record.url?scp=84939150619&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2014.2359433
DO - 10.1109/TCBB.2014.2359433
M3 - Journal article
C2 - 26357330
AN - SCOPUS:84939150619
SN - 1545-5963
VL - 12
SP - 887
EP - 901
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 4
M1 - 6948356
ER -