TY - JOUR
T1 - Taming System Dynamics on Resource Optimization for Data Processing Workflows
T2 - A Probabilistic Approach
AU - Zhou, Amelie Chi
AU - Xue, Weilin
AU - Xiao, Yao
AU - He, Bingsheng
AU - Ibrahim, Shadi
AU - Cheng, Reynold
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2022/6/22
Y1 - 2022/6/22
N2 - In many data-intensive applications, workflow is often used as an important model for organizing data processing tasks and resource provisioning is an important and challenging problem for improving the performance of workflows. Recently, system variations in the cloud and large-scale clusters, such as those in I/O and network performances and failure events, have been observed to greatly affect the performance of workflows. Traditional resource provisioning methods, which overlook these variations, can lead to suboptimal resource provisioning results. In this article, we provide a general solution for workflow performance optimizations considering system variations. Specifically, we model system dynamics as time-dependent random variables and take their probability distributions as optimization input. Despite its effectiveness, this solution involves heavy computation overhead. Thus, we propose three pruning techniques to simplify workflow structure and reduce the probability evaluation overhead. We implement our techniques in a runtime library, which allows users to incorporate efficient probabilistic optimization into existing resource provisioning methods. Experiments show that probabilistic solutions can improve the performance by up to 65 percent compared to state-of-The-Art static solutions, and our pruning techniques can greatly reduce the overhead of our probabilistic approach.
AB - In many data-intensive applications, workflow is often used as an important model for organizing data processing tasks and resource provisioning is an important and challenging problem for improving the performance of workflows. Recently, system variations in the cloud and large-scale clusters, such as those in I/O and network performances and failure events, have been observed to greatly affect the performance of workflows. Traditional resource provisioning methods, which overlook these variations, can lead to suboptimal resource provisioning results. In this article, we provide a general solution for workflow performance optimizations considering system variations. Specifically, we model system dynamics as time-dependent random variables and take their probability distributions as optimization input. Despite its effectiveness, this solution involves heavy computation overhead. Thus, we propose three pruning techniques to simplify workflow structure and reduce the probability evaluation overhead. We implement our techniques in a runtime library, which allows users to incorporate efficient probabilistic optimization into existing resource provisioning methods. Experiments show that probabilistic solutions can improve the performance by up to 65 percent compared to state-of-The-Art static solutions, and our pruning techniques can greatly reduce the overhead of our probabilistic approach.
KW - Cloud dynamics
KW - data processing workflows
KW - resource optimization
UR - http://www.scopus.com/inward/record.url?scp=85110651412&partnerID=8YFLogxK
UR - https://ieeexplore.ieee.org/document/9462122/keywords#keywords
U2 - 10.1109/TPDS.2021.3091400
DO - 10.1109/TPDS.2021.3091400
M3 - Journal article
AN - SCOPUS:85110651412
SN - 1045-9219
VL - 33
SP - 231
EP - 248
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 1
M1 - 9462122
ER -