Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters

Mark Junjie Li, Kwok Po Ng, Yiu Ming Cheung, Joshua Zhexue Huang

Research output: Contribution to journalJournal articlepeer-review

240 Citations (Scopus)

Abstract

In this paper, we present an agglomerative fuzzy K-Means clustering algorithm for numerical data, an extension to the standard fuzzy K-Means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the Initial cluster centers. The new algorithm can produce more consistent clustering results from different sets of Initial clusters centers. Combined with cluster validation techniques, the new algorithm can determine the number of clusters In a data set, which is a well-known problem In K-Means clustering. Experimental results on synthetic data sets (2 to 5 dimensions, 500 to 5,000 objects and 3 to 7 clusters), the BIRCH two-dimensional data set of 20,000 objects and 100 cluster0and the WINE data set of 178 objects, 17 dimensions, and 3 clusters from UCI have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.

Original languageEnglish
Pages (from-to)1519-1534
Number of pages16
JournalIEEE Transactions on Knowledge and Data Engineering
Volume20
Issue number11
DOIs
Publication statusPublished - Nov 2008

Scopus Subject Areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

User-Defined Keywords

  • Agglomerative
  • Cluster validation
  • Fuzzy K-Means clustering
  • Number of clusters

Fingerprint

Dive into the research topics of 'Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters'. Together they form a unique fingerprint.

Cite this