Traditional hierarchical clustering (HC) methods are not scalable with the size of databases. To address this issue, a series of summarization techniques, i.e. data bubbles (DB) and its improved versions, have been proposed to compress very large databases into representative seed points suitable for subsequent hierarchy construction. However, DB and its variants have two common drawbacks: 1) their performance is sensitive to the compression rate, and 2) their performance is sensitive to the initialization, i.e. the number and location of initialized seed points. This paper therefore proposes a new data summarization scheme, which is efficient and robust against the compression rate and initialization. In the proposed scheme, seed points are not only randomly initialized, but also trained to make them representative. After the training, a link strength network is constructed to achieve accurate hierarchy structure construction. Experiments demonstrate that the proposed method can produce high quality hierarchy structure with very high compression rate.