PAW: Data Partitioning Meets Workload Variance

Zhe Li, Man Lung Yiu, Tsz Nam Chan

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

3 Citations (Scopus)

Abstract

In distributed storage systems (e.g., HDFS, Amazon S3, Databricks), partitioning is applied on a dataset in order to enhance performance and availability. Recently, partitioning methods have been designed to optimize the query performance of partitions with respect to the historical query workload. Nevertheless, in practice, future query workloads may deviate from the historical query workload, thus deteriorating the performance of existing partitioning methods.
To fill this research gap, we model the variance of future query workloads from the historical query workload, then exploit this characteristic to produce partitions that perform well for future query workloads. In addition, we explore the space of irregular shaped partition regions to further optimize the query performance. Experimental results on TPC-H and real datasets show that our proposal is up to 70× more efficient than the state-of-the-art method.
Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PublisherIEEE
Pages123-135
Number of pages13
ISBN (Electronic)9781665408837
ISBN (Print)9781665408844
DOIs
Publication statusPublished - May 2022
Event38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Kuala Lumpur, Malaysia
Duration: 9 May 202212 May 2022
https://icde2022.ieeecomputer.my/
https://ieeexplore.ieee.org/xpl/conhome/9835153/proceeding

Publication series

NameProceedings of IEEE International Conference on Data Engineering (ICDE)
ISSN (Print)1063-6382
ISSN (Electronic)2375-026X

Conference

Conference38th IEEE International Conference on Data Engineering, ICDE 2022
Country/TerritoryMalaysia
CityKuala Lumpur
Period9/05/2212/05/22
Internet address

Scopus Subject Areas

  • Software
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'PAW: Data Partitioning Meets Workload Variance'. Together they form a unique fingerprint.

Cite this