Top-K Representative Search for Comparative Tree Summarization

Yuqi Chen, Xin Huang*, Bilian Chen*

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Data summarization aims at utilizing a small-scale summary to represent massive datasets as a whole, which is useful for visualization and information sipped generation. However, most existing studies of hierarchical summarization only work on one single tree by selecting k representative nodes, which neglects an important problem of comparative summarization on two trees. In this paper, given two trees with the same topology structure and different node weights, we aim at finding k representative nodes, where k1 nodes summarize the common relationship between them and k2 nodes highlight significantly different subtrees meanwhile satisfying k1 + k2 = k. To optimize summarization results, we introduce a scaling coefficient for balancing the summary view between two subtrees in terms of similarity and difference. Additionally, we propose a novel definition based on the Hellinger distance to quantify the node distribution difference between two subtrees. We present a greedy algorithm SVDT to find high-quality results with approximation guaranteed in an efficient way. Furthermore, we explore an extension of our comparative summarization to handle two trees with different structures. Extensive experiments demonstrate the effectiveness and efficiency of our SVDT algorithm against existing summarization competitors.

Original languageEnglish
Pages (from-to)1-7
Number of pages7
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
Publication statusE-pub ahead of print - 9 May 2025

User-Defined Keywords

  • hellinger distance
  • top-K diversification
  • Tree summarization

Fingerprint

Dive into the research topics of 'Top-K Representative Search for Comparative Tree Summarization'. Together they form a unique fingerprint.

Cite this