TY - JOUR
T1 - Adaptive Ensembling of Semi-Supervised Clustering Solutions
AU - Yu, Zhiwen
AU - Kuang, Zongqiang
AU - LIU, Jiming
AU - Chen, Hongsheng
AU - Zhang, Jun
AU - You, Jane
AU - Wong, Hau San
AU - Han, Guoqiang
N1 - Funding Information:
The work described in this paper was partially funded by the grants from the NSFC U1611461, No. 61572199, No. 61502174, and No. 61502173, the grant from the Guangdong Natural Science Funds for Distinguished Young Scholars (No. S2013050014677), the grant from the Science and Technology Planning Project of Guangdong Province, China (No. 2015A050502011, No. 2016B090918042, No. 2016A050503015, and No. 2016B010127003), the Fundamental Research Funds for the Central Universities (D2153950), and the grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 11300715, and No.152202/14E]. Zhiwen Yu and Jiming Liu are the corresponding authors. The authors are grateful for the constructive advice on the revision of the manuscript from the anonymous reviewers.
PY - 2017/8
Y1 - 2017/8
N2 - Conventional semi-supervised clustering approaches have several shortcomings, such as (1) not fully utilizing all useful must-link and cannot-link constraints, (2) not considering how to deal with high dimensional data with noise, and (3) not fully addressing the need to use an adaptive process to further improve the performance of the algorithm. In this paper, we first propose the transitive closure based constraint propagation approach, which makes use of the transitive closure operator and the affinity propagation to address the first limitation. Then, the random subspace based semi-supervised clustering ensemble framework with a set of proposed confidence factors is designed to address the second limitation and provide more stable, robust, and accurate results. Next, the adaptive semi-supervised clustering ensemble framework is proposed to address the third limitation, which adopts a newly designed adaptive process to search for the optimal subspace set. Finally, we adopt a set of nonparametric tests to compare different semi-supervised clustering ensemble approaches over multiple datasets. The experimental results on 20 real high dimensional cancer datasets with noisy genes and 10 datasets from UCI datasets and KEEL datasets show that (1) The proposed approaches work well on most of the real-world datasets. (2) It outperforms other state-of-the-art approaches on 12 out of 20 cancer datasets, and 8 out of 10 UCI machine learning datasets.
AB - Conventional semi-supervised clustering approaches have several shortcomings, such as (1) not fully utilizing all useful must-link and cannot-link constraints, (2) not considering how to deal with high dimensional data with noise, and (3) not fully addressing the need to use an adaptive process to further improve the performance of the algorithm. In this paper, we first propose the transitive closure based constraint propagation approach, which makes use of the transitive closure operator and the affinity propagation to address the first limitation. Then, the random subspace based semi-supervised clustering ensemble framework with a set of proposed confidence factors is designed to address the second limitation and provide more stable, robust, and accurate results. Next, the adaptive semi-supervised clustering ensemble framework is proposed to address the third limitation, which adopts a newly designed adaptive process to search for the optimal subspace set. Finally, we adopt a set of nonparametric tests to compare different semi-supervised clustering ensemble approaches over multiple datasets. The experimental results on 20 real high dimensional cancer datasets with noisy genes and 10 datasets from UCI datasets and KEEL datasets show that (1) The proposed approaches work well on most of the real-world datasets. (2) It outperforms other state-of-the-art approaches on 12 out of 20 cancer datasets, and 8 out of 10 UCI machine learning datasets.
KW - clustering
KW - Clustering ensemble
KW - semi-supervised clustering
UR - http://www.scopus.com/inward/record.url?scp=85029010856&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2017.2695615
DO - 10.1109/TKDE.2017.2695615
M3 - Journal article
AN - SCOPUS:85029010856
SN - 1041-4347
VL - 29
SP - 1577
EP - 1590
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 8
M1 - 7904723
ER -