TY - GEN
T1 - Implementation of a lattice Boltzmann method for large eddy simulation on multiple GPUs
AU - Li, Qinjian
AU - Zhong, Chengwen
AU - Li, Kai
AU - Zhang, Guangyong
AU - Lu, Xiaowei
AU - Zhang, Qing
AU - Zhao, Kaiyong
AU - CHU, Xiaowen
N1 - Copyright:
Copyright 2012 Elsevier B.V., All rights reserved.
PY - 2012
Y1 - 2012
N2 - Recently, the Graphic Processor Unit (GPU) has evolved into a highly parallel, multithreaded, many-core processor with tremendous computational horsepower and very high memory bandwidth. To improve the simulation efficiency of complex flow phenomena in the field of computational fluid dynamics, a CUDA-based simulation algorithm of large eddy simulation using multiple GPUs is proposed. Our implementation adopted the "collision after propagation" scheme and performed the propagation process by global memory reading transactions. The working set is split up into equal sub-domains and assigned to each GPU for simplicity. Using recently released hardware, up to four GPUs can be controlled by a single CPU thread and run in parallel. The results show that our multi-GPU implementation could perform simulations on a rather large scale (meshes: 10240x10240) even using double-precision floating point calculation and achieved 190X speedup over the sequential implementation on CPU.
AB - Recently, the Graphic Processor Unit (GPU) has evolved into a highly parallel, multithreaded, many-core processor with tremendous computational horsepower and very high memory bandwidth. To improve the simulation efficiency of complex flow phenomena in the field of computational fluid dynamics, a CUDA-based simulation algorithm of large eddy simulation using multiple GPUs is proposed. Our implementation adopted the "collision after propagation" scheme and performed the propagation process by global memory reading transactions. The working set is split up into equal sub-domains and assigned to each GPU for simplicity. Using recently released hardware, up to four GPUs can be controlled by a single CPU thread and run in parallel. The results show that our multi-GPU implementation could perform simulations on a rather large scale (meshes: 10240x10240) even using double-precision floating point calculation and achieved 190X speedup over the sequential implementation on CPU.
KW - Large eddy simulation
KW - Lattice Boltzmann method
KW - Multi-GPU Computing
KW - Parellel computing
UR - http://www.scopus.com/inward/record.url?scp=84870422464&partnerID=8YFLogxK
U2 - 10.1109/HPCC.2012.115
DO - 10.1109/HPCC.2012.115
M3 - Conference contribution
AN - SCOPUS:84870422464
SN - 9780769547497
T3 - Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012
SP - 818
EP - 823
BT - Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012
T2 - 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012
Y2 - 25 June 2012 through 27 June 2012
ER -