Abstract
In distributed storage systems (e.g., HDFS, Amazon S3, Databricks), partitioning is applied on a dataset in order to enhance performance and availability. Recently, partitioning methods have been designed to optimize the query performance of partitions with respect to the historical query workload. Nevertheless, in practice, future query workloads may deviate from the historical query workload, thus deteriorating the performance of existing partitioning methods.
To fill this research gap, we model the variance of future query workloads from the historical query workload, then exploit this characteristic to produce partitions that perform well for future query workloads. In addition, we explore the space of irregular shaped partition regions to further optimize the query performance. Experimental results on TPC-H and real datasets show that our proposal is up to 70× more efficient than the state-of-the-art method.
To fill this research gap, we model the variance of future query workloads from the historical query workload, then exploit this characteristic to produce partitions that perform well for future query workloads. In addition, we explore the space of irregular shaped partition regions to further optimize the query performance. Experimental results on TPC-H and real datasets show that our proposal is up to 70× more efficient than the state-of-the-art method.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022 |
Publisher | IEEE |
Pages | 123-135 |
Number of pages | 13 |
ISBN (Electronic) | 9781665408837 |
ISBN (Print) | 9781665408844 |
DOIs | |
Publication status | Published - May 2022 |
Event | 38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Kuala Lumpur, Malaysia Duration: 9 May 2022 → 12 May 2022 https://icde2022.ieeecomputer.my/ https://ieeexplore.ieee.org/xpl/conhome/9835153/proceeding |
Publication series
Name | Proceedings of IEEE International Conference on Data Engineering (ICDE) |
---|---|
ISSN (Print) | 1063-6382 |
ISSN (Electronic) | 2375-026X |
Conference
Conference | 38th IEEE International Conference on Data Engineering, ICDE 2022 |
---|---|
Country/Territory | Malaysia |
City | Kuala Lumpur |
Period | 9/05/22 → 12/05/22 |
Internet address |
Scopus Subject Areas
- Software
- Information Systems
- Signal Processing