Skip to main navigation Skip to search Skip to main content

Incorporating probabilistic optimizations for resource provisioning of data processing workflows

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

2 Citations (Scopus)

Abstract

Workflow is an important model for big data processing and resource provisioning is crucial to the performance of workflows. Recently, system variations in the cloud and large-scale clusters, such as those in I/O and network performances, have been observed to greatly affect the performance of workflows. Traditional resource provisioning methods, which overlook these variations, can lead to suboptimal resource provisioning results. In this paper, we provide a general solution for workflow performance optimizations considering system variations. Specifically, we model system variations as time-dependent random variables and take their probability distributions as optimization input. Despite its effectiveness, this solution involves heavy computation overhead. Thus, we propose three pruning techniques to simplify workflow structure and reduce the probability evaluation overhead. We implement our techniques in a runtime library, which allows users to incorporate efficient probabilistic optimization into existing resource provisioning methods. Experiments show that probabilistic solutions can improve the performance by 51% compared to state-of-the-art static solutions while guaranteeing budget constraint, and our pruning techniques can greatly reduce the overhead of probabilistic optimization.

Original languageEnglish
Title of host publicationProceedings of the 48th International Conference on Parallel Processing, ICPP 2019
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Number of pages10
ISBN (Electronic)9781450362955
DOIs
Publication statusPublished - 5 Aug 2019
Event48th International Conference on Parallel Processing, ICPP 2019 - Kyoto, Japan
Duration: 5 Aug 20198 Aug 2019
https://dl.acm.org/doi/proceedings/10.1145/3337821 (Conference Proceedings)

Publication series

NameACM International Conference Proceeding Series

Conference

Conference48th International Conference on Parallel Processing, ICPP 2019
Country/TerritoryJapan
CityKyoto
Period5/08/198/08/19
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • Resource provisioning
  • Cloud dynamics
  • Workflows

Fingerprint

Dive into the research topics of 'Incorporating probabilistic optimizations for resource provisioning of data processing workflows'. Together they form a unique fingerprint.

Cite this