Quality preserved data summarization for fast hierarchical clustering

Research output: Chapter in book/report/conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Traditional hierarchical clustering (HC) methods are not scalable with the size of databases. To address this issue, a series of summarization techniques, i.e. data bubbles (DB) and its improved versions, have been proposed to compress very large databases into representative seed points suitable for subsequent hierarchy construction. However, DB and its variants have two common drawbacks: 1) their performance is sensitive to the compression rate, and 2) their performance is sensitive to the initialization, i.e. the number and location of initialized seed points. This paper therefore proposes a new data summarization scheme, which is efficient and robust against the compression rate and initialization. In the proposed scheme, seed points are not only randomly initialized, but also trained to make them representative. After the training, a link strength network is constructed to achieve accurate hierarchy structure construction. Experiments demonstrate that the proposed method can produce high quality hierarchy structure with very high compression rate.

Original languageEnglish
Title of host publication2016 International Joint Conference on Neural Networks, IJCNN 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4139-4146
Number of pages8
ISBN (Electronic)9781509006199
DOIs
Publication statusPublished - 31 Oct 2016
Event2016 International Joint Conference on Neural Networks, IJCNN 2016 - Vancouver, Canada
Duration: 24 Jul 201629 Jul 2016

Publication series

NameProceedings of the International Joint Conference on Neural Networks
Volume2016-October

Conference

Conference2016 International Joint Conference on Neural Networks, IJCNN 2016
Country/TerritoryCanada
CityVancouver
Period24/07/1629/07/16

Scopus Subject Areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Quality preserved data summarization for fast hierarchical clustering'. Together they form a unique fingerprint.

Cite this