Document decomposition for XML compression: A heuristic approach

Byron Choi*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

4 Citations (Scopus)


Sharing of common subtrees has been reported useful not only for XML compression but also for main-memory XML query processing. This method compresses subtrees only when they exhibit identical structure. Even slight irregularities among subtrees dramatically reduce the performance of compression algorithms of this kind. Furthermore, when XML documents are large, the chance of having large number of identical subtrees is inherently low. In this paper, we proposed a method of decomposing XML documents for better compression. We proposed a heuristic method of locating minor irregularities in XML documents. The irregularities are then projected out from the original XML document. We refered this process to as document decomposition. We demonstrated that better compression can be achieved by compressing the decomposed documents separately. Experimental results demonstrated that the compressed skeletons, for all real-world datasets, to our knowledge, fit comfortably into main memory of commodity computers nowadays. Preliminary results on querying compressed skeletons validate the effectiveness our approach.

Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications
Subtitle of host publication11th International Conference, DASFAA 2006, Singapore, April 12-15, 2006, Proceedings
EditorsMong Lee, Kian-Lee Tan, Vilas Wuwongse
PublisherSpringer Berlin Heidelberg
Number of pages16
ISBN (Electronic)9783540333388
ISBN (Print)9783540333371
Publication statusPublished - 11 Mar 2006
Event11th International Conference on Database Systems for Advanced Applications, DASFAA 2006 - , Singapore
Duration: 12 Apr 200615 Apr 2006

Publication series

NameLecture Notes in Computer Science (LNCS)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameInformation Systems and Applications, incl. Internet/Web, and HCI (LNISA)
NameDASFAA: International Conference on Database Systems for Advanced Applications


Conference11th International Conference on Database Systems for Advanced Applications, DASFAA 2006
Internet address

Scopus Subject Areas

  • Theoretical Computer Science
  • Computer Science(all)

User-Defined Keywords

  • Support Ratio
  • Query Evaluation
  • Query Performance
  • Good Compression
  • Path Query


Dive into the research topics of 'Document decomposition for XML compression: A heuristic approach'. Together they form a unique fingerprint.

Cite this