Abstract
Data summarization that presents a small subset of a dataset to users has been widely applied in numerous applications and systems. Many datasets are coded with hierarchical terminologies, e.g., gene ontology, disease ontology, to name a few. In this paper, we study the weighted tree summarization. We motivate and formulate our kWTS- problem as selecting a diverse set of k nodes to summarize a hierarchical tree T with weighted terminologies. We first propose an efficient greedy tree summarization algorithm GTS. It solves the problem with (1-1/e)-approximation guarantee. Although GTS achieves quality-guaranteed answers approximately, but it is still not optimal. To tackle the problem optimally, we further develop a dynamic programming algorithm OTS to obtain optimal answers for kWTS problem in O(nhk3) time, where n,h are the node size and height in tree T. The algorithm complexity and correctness of OTS are theoretically analyzed. In addition, we propose a useful optimization technique of tree reduction to remove useless nodes with zero weights and shrink the tree into a smaller one, which ensures the efficiency acceleration of both GTS and OTS in real-world datasets. Moreover, we illustrate one useful application of graph visualization based on the answer of k-sized tree summarization and show it in a novel case study. Extensive experimental results on real-world datasets show the effectiveness and efficiency of our proposed approximate and optimal algorithms for tree summarization. Furthermore, we conduct a usability evaluation of attractive topic recommendation on ACM Computing Classification System dataset to validate the usefulness of our model and algorithms.
Original language | English |
---|---|
Pages (from-to) | 2500-2514 |
Number of pages | 15 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 35 |
Issue number | 3 |
Early online date | 19 Oct 2021 |
DOIs | |
Publication status | Published - 1 Mar 2023 |
Scopus Subject Areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics
User-Defined Keywords
- Data Summarization
- Hierarchy
- Optimal Algorithm
- Top-k
- Tree