TY - JOUR
T1 - Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters
AU - Li, Mark Junjie
AU - Ng, Kwok Po
AU - Cheung, Yiu Ming
AU - Huang, Joshua Zhexue
N1 - Funding Information:
The work in this paper was supported by the Research Grant Council of Hong Kong SAR under Projects: 7045/04P, 7045/05P, HKBU 2156/04E, and HKBU 210306, and the Faculty Research Grant of Hong Kong Baptist University under Project: HKBU 05-06/II-42, and supported by the Natural Science Foundation of China under Grant 60603066.
PY - 2008/11
Y1 - 2008/11
N2 - In this paper, we present an agglomerative fuzzy K-Means clustering algorithm for numerical data, an extension to the standard fuzzy K-Means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the Initial cluster centers. The new algorithm can produce more consistent clustering results from different sets of Initial clusters centers. Combined with cluster validation techniques, the new algorithm can determine the number of clusters In a data set, which is a well-known problem In K-Means clustering. Experimental results on synthetic data sets (2 to 5 dimensions, 500 to 5,000 objects and 3 to 7 clusters), the BIRCH two-dimensional data set of 20,000 objects and 100 cluster0and the WINE data set of 178 objects, 17 dimensions, and 3 clusters from UCI have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.
AB - In this paper, we present an agglomerative fuzzy K-Means clustering algorithm for numerical data, an extension to the standard fuzzy K-Means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the Initial cluster centers. The new algorithm can produce more consistent clustering results from different sets of Initial clusters centers. Combined with cluster validation techniques, the new algorithm can determine the number of clusters In a data set, which is a well-known problem In K-Means clustering. Experimental results on synthetic data sets (2 to 5 dimensions, 500 to 5,000 objects and 3 to 7 clusters), the BIRCH two-dimensional data set of 20,000 objects and 100 cluster0and the WINE data set of 178 objects, 17 dimensions, and 3 clusters from UCI have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.
KW - Agglomerative
KW - Cluster validation
KW - Fuzzy K-Means clustering
KW - Number of clusters
UR - http://www.scopus.com/inward/record.url?scp=52949101047&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2008.88
DO - 10.1109/TKDE.2008.88
M3 - Journal article
AN - SCOPUS:52949101047
SN - 1041-4347
VL - 20
SP - 1519
EP - 1534
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 11
ER -