Abstract
Traditional cluster ensemble approaches have several limitations. (1) Few make use of prior knowledge provided by experts. (2) It is difficult to achieve good performance in high-dimensional datasets. (3) All of the weight values of the ensemble members are equal, which ignores different contributions from different ensemble members. (4) Not all pairwise constraints contribute to the final result. In the face of this situation, we propose double weighting semi-supervised ensemble clustering based on selected constraint projection(DCECP) which applies constraint weighting and ensemble member weighting to address these limitations. Specifically, DCECP first adopts the random subspace technique in combination with the constraint projection procedure to handle high-dimensional datasets. Second, it treats prior knowledge of experts as pairwise constraints, and assigns different subsets of pairwise constraints to different ensemble members. An adaptive ensemble member weighting process is designed to associate different weight values with different ensemble members. Third, the weighted normalized cut algorithm is adopted to summarize clustering solutions and generate the final result. Finally, nonparametric statistical tests are used to compare multiple algorithms on real-world datasets. Our experiments on 15 high-dimensional datasets show that DCECP performs better than most clustering algorithms.
Original language | English |
---|---|
Article number | 8323237 |
Pages (from-to) | 2394-2407 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 30 |
Issue number | 12 |
DOIs | |
Publication status | Published - 1 Dec 2018 |
Scopus Subject Areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics
User-Defined Keywords
- Cluster ensemble
- pairwise constraint
- projection
- semi-supervised clustering