A quantitative summary of XML structures

Zi Lin*, Bingsheng He, Byron Choi

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

6 Citations (Scopus)

Abstract

Statistical summaries in relational databases mainly focus on the distribution of data values and have been found useful for various applications, such as query evaluation and data storage. As xml has been widely used, e.g. for online data exchange, the need for (corresponding) statistical summaries in xml has been evident. While relational techniques may be applicable to the data values in xml documents, novel techniques are requried for summarizing the structures of xml documents. In this paper, we propose metrics for major structural properties, in particular, nestings of entities and one-to-many relationships, of XML documents. Our technique is different from the existing ones in that we generate a quantitative summary of an xml structure. By using our approach, we illustrate that some popular real-world and synthetic xml benchmark datasets are indeed highly skewed and hardly hierarchical and contain few recursions. We wish this preliminary finding shreds insight on improving the design of xml benchmarking and experimentations.

Original languageEnglish
Title of host publicationConceptual Modeling - ER 2006
Subtitle of host publication25th International Conference on Conceptual Modeling, Tucson, AZ, USA, November 6-9, 2006, Proceedings
EditorsDavid W. Embley, Antoni Olivé, Sudha Ram
PublisherSpringer Berlin Heidelberg
Pages228-240
Number of pages13
Edition1st
ISBN (Electronic)9783540472278
ISBN (Print)354047224X, 9783540472247
DOIs
Publication statusPublished - 24 Oct 2006
Event25th International Conference on Conceptual Modeling - ER 2006 - Tucson, AZ, United States
Duration: 6 Nov 20069 Nov 2006
https://link.springer.com/book/10.1007/11901181

Publication series

Name Lecture Notes in Computer Science (LNCS)
Volume4215
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameInformation Systems and Applications, incl. Internet/Web, and HCI (LNISA)
NameER: International Conference on Conceptual Modeling

Conference

Conference25th International Conference on Conceptual Modeling - ER 2006
Country/TerritoryUnited States
CityTucson, AZ
Period6/11/069/11/06
Internet address

Scopus Subject Areas

  • Theoretical Computer Science
  • General Computer Science

User-Defined Keywords

  • Support Ratio
  • Query Evaluation
  • Selectivity Estimation
  • Document Instance
  • Query Workload

Fingerprint

Dive into the research topics of 'A quantitative summary of XML structures'. Together they form a unique fingerprint.

Cite this