TY - JOUR
T1 - GPGPU Performance Estimation with Core and Memory Frequency Scaling
AU - WANG, Qiang
AU - CHU, Xiaowen
N1 - Funding Information:
This research was supported by Hong Kong RGC GRF grants under the contracts HKBU 12200418. The authors would like to thank the anonymous reviewers whose comments have greatly improved this manuscript. See Supplementary material, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety. org/10.1109/ TPDS.2020.3004623.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling to balance computational performance and energy consumption. However, accurate and straightforward performance estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this article, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs simple and one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy (average errors of 3.85, 8.6, 8.82, and 8.83 percent on a set of 20 GPU kernels with four modern Nvidia GPUs).
AB - Contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling to balance computational performance and energy consumption. However, accurate and straightforward performance estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this article, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs simple and one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy (average errors of 3.85, 8.6, 8.82, and 8.83 percent on a set of 20 GPU kernels with four modern Nvidia GPUs).
KW - dynamic voltage and frequency scaling
KW - GPU performance modeling
KW - Graphics processing units
UR - http://www.scopus.com/inward/record.url?scp=85088395090&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2020.3004623
DO - 10.1109/TPDS.2020.3004623
M3 - Journal article
AN - SCOPUS:85088395090
SN - 1045-9219
VL - 31
SP - 2865
EP - 2881
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 12
M1 - 9124659
ER -