Abstract
Partitioning a set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means-type algorithm is best suited for implementing this operation because of its efficiency in clustering large numerical and categorical data sets. An efficient parallel k-means-type algorithm for clustering data sets on a distributed share-nothing parallel system is considered. It has a simple communication scheme which performs only one round of information exchange in every iteration. We show that the speedup of our algorithm is asymptotically linear when the number of objects is sufficiently large. We implement the parallel k-means-type algorithm on an IBM SP2 parallel machine. The performance studies show that the algorithm has nice parallelism in experiments.
Original language | English |
---|---|
Pages (from-to) | 75-91 |
Number of pages | 17 |
Journal | International Journal of High Speed Computing |
Volume | 11 |
Issue number | 2 |
DOIs | |
Publication status | Published - Jun 2000 |
Scopus Subject Areas
- Theoretical Computer Science
- Computational Theory and Mathematics
User-Defined Keywords
- Clustering
- Data mining
- K-means-type algorithm
- Parallel algorithms