Vectorizing and querying large XML repositories

Peter Buneman, Byron Choi, Wenfei Fan, Robert Hutchison, Robert Mann, Stratis D. Viglas

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

42 Citations (Scopus)


Vertical partitioning is a well-known technique for optimizing query performance in relational databases. An extreme form of this technique, which we call vectorization, is to store each column separately. We use a generalization of vectorization as the basis for a native XML store. The idea is to decompose an XML document into a set of vectors that contain the data values and a compressed skeleton that describes the structure. In order to query this representation and produce results in the same vectorized format, we consider a practical fragment of XQuery and introduce the notion of query graphs and a novel graph reduction algorithm that allows us to leverage relational optimization techniques as well as to reduce the unnecessary loading of data vectors and decompression of skeletons. A preliminary experimental study based on some scientific and synthetic XML data repositories in the order of gigabytes supports the claim that these techniques are scalable and have the potential to provide performance comparable with established relational database technology.
Original languageEnglish
Title of host publication21st International Conference on Data Engineering (ICDE'05)
Number of pages12
ISBN (Print)0769522858
Publication statusPublished - Apr 2005
Event21st International Conference on Data Engineering, ICDE 2005 - Tokyo, Japan
Duration: 5 Apr 20058 Apr 2005

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1063-6382
ISSN (Electronic)2375-026X


Conference21st International Conference on Data Engineering, ICDE 2005
Internet address


Dive into the research topics of 'Vectorizing and querying large XML repositories'. Together they form a unique fingerprint.

Cite this