TY - JOUR
T1 - DtCraft: A High-Performance Distributed Execution Engine at Scale
AU - Huang, Tsung-Wei
AU - Lin, Chun-Xun
AU - Wong, Martin D. F.
N1 - Funding information:
10.13039/100000001-National Science Foundation (Grant Number: CCF-1421563 and CCF-171883)
Publisher copyright:
© 2018 IEEE
PY - 2019/6
Y1 - 2019/6
N2 - Recent years have seen rapid growth in data-driven distributed systems, such as Hadoop MapReduce, Spark, and Dryad. However, the counterparts for high-performance or compute-intensive applications including large-scale optimizations, modeling, and simulations are still nascent. In this paper, we introduce DtCraft, a modern C++ based distributed execution engine to streamline the development of high-performance parallel applications. Users need no understanding of distributed computing and can focus on high-level developments, leaving difficult details, such as concurrency controls, workload distribution, and fault tolerance handled by our system transparently. We have evaluated DtCraft on both micro-benchmarks and large-scale optimization problems, and shown the promising performance from single multicore machines to clusters of computers. In a particular semiconductor design problem, we achieved 30× speedup with 40 nodes and 15× less development efforts over hand-crafted implementation.
AB - Recent years have seen rapid growth in data-driven distributed systems, such as Hadoop MapReduce, Spark, and Dryad. However, the counterparts for high-performance or compute-intensive applications including large-scale optimizations, modeling, and simulations are still nascent. In this paper, we introduce DtCraft, a modern C++ based distributed execution engine to streamline the development of high-performance parallel applications. Users need no understanding of distributed computing and can focus on high-level developments, leaving difficult details, such as concurrency controls, workload distribution, and fault tolerance handled by our system transparently. We have evaluated DtCraft on both micro-benchmarks and large-scale optimization problems, and shown the promising performance from single multicore machines to clusters of computers. In a particular semiconductor design problem, we achieved 30× speedup with 40 nodes and 15× less development efforts over hand-crafted implementation.
U2 - 10.1109/TCAD.2018.2834422
DO - 10.1109/TCAD.2018.2834422
M3 - Journal article
SN - 0278-0070
VL - 38
SP - 1070
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 6
M1 - 14
ER -