TY - GEN
T1 - Benchmarking the memory hierarchy of modern GPUs
AU - Mei, Xinxin
AU - Zhao, Kaiyong
AU - Liu, Chengjian
AU - CHU, Xiaowen
N1 - Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014
Y1 - 2014
N2 - Memory access efficiency is a key factor for fully exploiting the computational power of Graphics Processing Units (GPUs). However, many details of the GPU memory hierarchy are not released by the vendors. We propose a novel fine-grained benchmarking approach and apply it on two popular GPUs, namely Fermi and Kepler, to expose the previously unknown characteristics of their memory hierarchies. Specifically, we investigate the structures of different cache systems, such as data cache, texture cache, and the translation lookaside buffer (TLB). We also investigate the impact of bank conflict on shared memory access latency. Our benchmarking results offer a better understanding on the mysterious GPU memory hierarchy, which can help in the software optimization and the modelling of GPU architectures. Our source code and experimental results are publicly available.
AB - Memory access efficiency is a key factor for fully exploiting the computational power of Graphics Processing Units (GPUs). However, many details of the GPU memory hierarchy are not released by the vendors. We propose a novel fine-grained benchmarking approach and apply it on two popular GPUs, namely Fermi and Kepler, to expose the previously unknown characteristics of their memory hierarchies. Specifically, we investigate the structures of different cache systems, such as data cache, texture cache, and the translation lookaside buffer (TLB). We also investigate the impact of bank conflict on shared memory access latency. Our benchmarking results offer a better understanding on the mysterious GPU memory hierarchy, which can help in the software optimization and the modelling of GPU architectures. Our source code and experimental results are publicly available.
UR - http://www.scopus.com/inward/record.url?scp=84906739510&partnerID=8YFLogxK
U2 - 10.1007/978-3-662-44917-2_13
DO - 10.1007/978-3-662-44917-2_13
M3 - Conference proceeding
AN - SCOPUS:84906739510
SN - 9783662449165
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 144
EP - 156
BT - Network and Parallel Computing - 11th IFIP WG 10.3 International Conference, NPC 2014, Proceedings
PB - Springer Verlag
T2 - 11th IFIP WG 10.3 International Conference on Network and Parallel Computing, NPC 2014
Y2 - 18 September 2014 through 20 September 2014
ER -