TY - GEN
T1 - Quality preserved data summarization for fast hierarchical clustering
AU - Zhang, Yiqun
AU - CHEUNG, Yiu Ming
AU - LIU, Yang
N1 - Funding Information:
This work was supported by the grant of National Natural Science Foundation of China: 61272366, the Faculty Research Grant of Hong Kong Baptist University (HKBU) under Project Code: FRG2/15-16/049, and the grant of HKBU Knowledge Transfer Office: MPCF-005-2014/2015.
PY - 2016/10/31
Y1 - 2016/10/31
N2 - Traditional hierarchical clustering (HC) methods are not scalable with the size of databases. To address this issue, a series of summarization techniques, i.e. data bubbles (DB) and its improved versions, have been proposed to compress very large databases into representative seed points suitable for subsequent hierarchy construction. However, DB and its variants have two common drawbacks: 1) their performance is sensitive to the compression rate, and 2) their performance is sensitive to the initialization, i.e. the number and location of initialized seed points. This paper therefore proposes a new data summarization scheme, which is efficient and robust against the compression rate and initialization. In the proposed scheme, seed points are not only randomly initialized, but also trained to make them representative. After the training, a link strength network is constructed to achieve accurate hierarchy structure construction. Experiments demonstrate that the proposed method can produce high quality hierarchy structure with very high compression rate.
AB - Traditional hierarchical clustering (HC) methods are not scalable with the size of databases. To address this issue, a series of summarization techniques, i.e. data bubbles (DB) and its improved versions, have been proposed to compress very large databases into representative seed points suitable for subsequent hierarchy construction. However, DB and its variants have two common drawbacks: 1) their performance is sensitive to the compression rate, and 2) their performance is sensitive to the initialization, i.e. the number and location of initialized seed points. This paper therefore proposes a new data summarization scheme, which is efficient and robust against the compression rate and initialization. In the proposed scheme, seed points are not only randomly initialized, but also trained to make them representative. After the training, a link strength network is constructed to achieve accurate hierarchy structure construction. Experiments demonstrate that the proposed method can produce high quality hierarchy structure with very high compression rate.
UR - http://www.scopus.com/inward/record.url?scp=85007203258&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2016.7727739
DO - 10.1109/IJCNN.2016.7727739
M3 - Conference proceeding
AN - SCOPUS:85007203258
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 4139
EP - 4146
BT - 2016 International Joint Conference on Neural Networks, IJCNN 2016
PB - IEEE
T2 - 2016 International Joint Conference on Neural Networks, IJCNN 2016
Y2 - 24 July 2016 through 29 July 2016
ER -