From Global to Query-Dependent: Summarization of Large Hierarchical DAGs

Research output: Contribution to journalJournal articlepeer-review

Abstract

Hierarchical directed acyclic graph (DAG) is an essential model for representing terminologies and their hierarchical relationships, such as Disease Ontology and ImageNet categories. Due to the vast number of terminologies and complex structures in a large DAG, it becomes challenging for humans to effectively analyze and explore the hierarchical relationships they encode. Therefore, summarizing hierarchical DAGs is essential for enhancing the interpretability and visualization of the underlying hierarchy. Beyond visual data exploration, hierarchical DAG summarization also supports a range of applications, such as biomedical ontology analytics, snippet generation for information search, and summarized recommendation. In this paper, we address a new problem of finding k representative vertices to summarize a hierarchical DAG. To capture diverse summarization and identify important vertices, we design a summary score function that reflects vertices diversity coverage and structure correlation. The studied problem is theoretically proven to be NP-hard. To tackle it efficiently, we propose a greedy algorithm with an approximation guarantee that iteratively adds vertices with significant summary contributions to the answers. To further enhance the answer quality, we introduce a subtree extraction-based method that is proven to achieve higher-quality answers. Additionally, we develop a scalable algorithm, k-PCGS, which employs candidate pruning and DAG compression for large-scale hierarchical DAGs. For the query-dependent problem, we propose an index-based method and several optimization techniques to improve efficiency. Extensive experiments on large real-world datasets demonstrate the effectiveness and efficiency of our proposed algorithms.
Original languageEnglish
Number of pages31
JournalACM Transactions on Database Systems
DOIs
Publication statusE-pub ahead of print - 20 Sept 2025

User-Defined Keywords

  • Hierarchy
  • Data summarization
  • Graph algorithm

Fingerprint

Dive into the research topics of 'From Global to Query-Dependent: Summarization of Large Hierarchical DAGs'. Together they form a unique fingerprint.

Cite this