TY - JOUR
T1 - A GPU-accelerated Framework for Path-based Timing Analysis
AU - Guo, Guannan
AU - Huang, Tsung Wei
AU - Lin, Yibo
AU - Guo, Zizheng
AU - Yellapragada, Sushma
AU - Wong, Martin D.F.
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2023/11
Y1 - 2023/11
N2 - As a key routine in static timing analysis (STA), path-based analysis (PBA) plays a very important role in refining the critical path report by reducing excessive slack pessimism. PBA is also well known for its long execution time, which makes it a hot topic for parallel computing in the STA community. However, nearly all of the parallel PBA algorithms are restricted to CPU architectures, which greatly limits their scalability. To achieve a new performance milestone on PBA, we must leverage the high throughput computing in the graphics processing unit (GPU). Therefore, in this work, we propose a new GPU-accelerated PBA framework which contains compact data structures and highly efficient kernels. By integrating with GPU-accelerated preprocessing steps, our framework can also effectively handle extensive critical path constraints. Besides, we highlight many optimization techniques that can overcome the execution bottleneck and further boost the performance. In experiments, we demonstrate 543× speed-up compared to the state-of-the-art PBA algorithm on the design with 1.6 million gates, which outperforms 25× – 45× over the state-of-the-art parallel PBA algorithm on 40 CPU cores. A fully optimized framework can achieve 3× – 5× speed-up on top of that.
AB - As a key routine in static timing analysis (STA), path-based analysis (PBA) plays a very important role in refining the critical path report by reducing excessive slack pessimism. PBA is also well known for its long execution time, which makes it a hot topic for parallel computing in the STA community. However, nearly all of the parallel PBA algorithms are restricted to CPU architectures, which greatly limits their scalability. To achieve a new performance milestone on PBA, we must leverage the high throughput computing in the graphics processing unit (GPU). Therefore, in this work, we propose a new GPU-accelerated PBA framework which contains compact data structures and highly efficient kernels. By integrating with GPU-accelerated preprocessing steps, our framework can also effectively handle extensive critical path constraints. Besides, we highlight many optimization techniques that can overcome the execution bottleneck and further boost the performance. In experiments, we demonstrate 543× speed-up compared to the state-of-the-art PBA algorithm on the design with 1.6 million gates, which outperforms 25× – 45× over the state-of-the-art parallel PBA algorithm on 40 CPU cores. A fully optimized framework can achieve 3× – 5× speed-up on top of that.
KW - Graphics processing unit (GPU) acceleration
KW - static timing analysis (STA)
UR - http://www.scopus.com/inward/record.url?scp=85159843646&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2023.3272274
DO - 10.1109/TCAD.2023.3272274
M3 - Journal article
AN - SCOPUS:85159843646
SN - 0278-0070
VL - 42
SP - 4219
EP - 4232
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 11
ER -