TY - JOUR
T1 - A parallel lattice Boltzmann method for large eddy simulation on multiple GPUs
AU - Li, Qinjian
AU - Zhong, Chengwen
AU - Li, Kai
AU - Zhang, Guangyong
AU - Lu, Xiaowei
AU - Zhang, Qing
AU - Zhao, Kaiyong
AU - Chu, Xiaowen
PY - 2014/6
Y1 - 2014/6
N2 - To improve the simulation efficiency of turbulent fluid flows at high Reynolds numbers with large eddy dynamics, a CUDA-based simulation solution of lattice Boltzmann method for large eddy simulation (LES) using multiple graphics processing units (GPUs) is proposed. Our solution adopts the "collision after propagation" lattice evolutionway and puts the misaligned propagation phase at global memory read process. The latest GPU platform allows a single CPU thread to control up to four GPUs that run in parallel. In order to make use of multiple GPUs, the whole working set is evenly partitioned into sub-domains. We implement Smagorinsky model and Vreman model respectively to verify our multi-GPU solution. These two LES models have different relaxation time calculation behavior and lead to different CUDA implementation characteristics. The implementation based on Smagorinsky model achieves 190 times speedup over the sequential implementation on CPU,while the implementation based on Vreman model archives more than 90 times speedup. The experimental results show that the parallel performance of our multi-GPU solution scales very well on multiple GPUs. Therefore large-scale (up to 10,240 × 10,240 lattices) LES-LBM simulation becomes possible at a low cost, even using double-precision floating point calculation.
AB - To improve the simulation efficiency of turbulent fluid flows at high Reynolds numbers with large eddy dynamics, a CUDA-based simulation solution of lattice Boltzmann method for large eddy simulation (LES) using multiple graphics processing units (GPUs) is proposed. Our solution adopts the "collision after propagation" lattice evolutionway and puts the misaligned propagation phase at global memory read process. The latest GPU platform allows a single CPU thread to control up to four GPUs that run in parallel. In order to make use of multiple GPUs, the whole working set is evenly partitioned into sub-domains. We implement Smagorinsky model and Vreman model respectively to verify our multi-GPU solution. These two LES models have different relaxation time calculation behavior and lead to different CUDA implementation characteristics. The implementation based on Smagorinsky model achieves 190 times speedup over the sequential implementation on CPU,while the implementation based on Vreman model archives more than 90 times speedup. The experimental results show that the parallel performance of our multi-GPU solution scales very well on multiple GPUs. Therefore large-scale (up to 10,240 × 10,240 lattices) LES-LBM simulation becomes possible at a low cost, even using double-precision floating point calculation.
KW - CUDA
KW - GPU Computing
KW - Large eddy simulation
KW - Lattice Boltzmann method
UR - http://www.scopus.com/inward/record.url?scp=84901617575&partnerID=8YFLogxK
U2 - 10.1007/s00607-013-0356-7
DO - 10.1007/s00607-013-0356-7
M3 - 文章
AN - SCOPUS:84901617575
SN - 0010-485X
VL - 96
SP - 479
EP - 501
JO - Computing
JF - Computing
IS - 6
ER -