Implementation and optimisation of the cdugksFoam solver on the Sunway TaihuLight supercomputer

Jie Guo; Yunlan Wang; Rui Zhang; Feifei Zhang; Tianhai Zhao; Congshan Zhuo; Sha Liu; Chengwen Zhong

doi:10.1016/j.cpc.2024.109455

Implementation and optimisation of the cdugksFoam solver on the Sunway TaihuLight supercomputer

Jie Guo, Yunlan Wang, Rui Zhang, Feifei Zhang, Tianhai Zhao, Congshan Zhuo, Sha Liu, Chengwen Zhong

School of Aeronautics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

Abstract

In this study, the cdugksFoam solver was successfully implemented on the Sunway TaihuLight system using the MPI + Athread programming model. To utilise the heterogeneous SW26010 many-core processor fully, we implemented three levels of parallelisation: MPI process-level hybrid parallelisation in physical space and velocity space, thread-level parallelisation to further partition physical space, and single-instruction multiple-data (SIMD) vectorisation. To address the performance bottleneck caused by the low memory bandwidth of the SW26010 processor, a series of optimisation methods, including kernel fusion, transcendental function optimisation, and soft cache, were designed and implemented to reduce discrete memory access and improve the computational efficiency of CPEs. The accuracy of the optimised program was validated through simulations of the 3D lid-driven cavity flow and rarefied supersonic flow past a sphere. Based on the experimental results from multiple grid scales, the overall performance achieved an acceleration of over 5 times compared to running on the management processing elements. In both strong and weak scalability tests, a parallel efficiency exceeding 90% was achieved.

Original language	English
Article number	109455
Journal	Computer Physics Communications
Volume	308
DOIs	https://doi.org/10.1016/j.cpc.2024.109455
State	Published - Mar 2025

Keywords

Discrete unified gas kinetic scheme
High-performance computing
OpenFOAM
Sunway TaihuLight supercomputer

Access to Document

10.1016/j.cpc.2024.109455

Cite this

@article{7af66970bb1843c78b6a58db1ffe65da,

title = "Implementation and optimisation of the cdugksFoam solver on the Sunway TaihuLight supercomputer",

abstract = "In this study, the cdugksFoam solver was successfully implemented on the Sunway TaihuLight system using the MPI + Athread programming model. To utilise the heterogeneous SW26010 many-core processor fully, we implemented three levels of parallelisation: MPI process-level hybrid parallelisation in physical space and velocity space, thread-level parallelisation to further partition physical space, and single-instruction multiple-data (SIMD) vectorisation. To address the performance bottleneck caused by the low memory bandwidth of the SW26010 processor, a series of optimisation methods, including kernel fusion, transcendental function optimisation, and soft cache, were designed and implemented to reduce discrete memory access and improve the computational efficiency of CPEs. The accuracy of the optimised program was validated through simulations of the 3D lid-driven cavity flow and rarefied supersonic flow past a sphere. Based on the experimental results from multiple grid scales, the overall performance achieved an acceleration of over 5 times compared to running on the management processing elements. In both strong and weak scalability tests, a parallel efficiency exceeding 90% was achieved.",

keywords = "Discrete unified gas kinetic scheme, High-performance computing, OpenFOAM, Sunway TaihuLight supercomputer",

author = "Jie Guo and Yunlan Wang and Rui Zhang and Feifei Zhang and Tianhai Zhao and Congshan Zhuo and Sha Liu and Chengwen Zhong",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2025",

month = mar,

doi = "10.1016/j.cpc.2024.109455",

language = "英语",

volume = "308",

journal = "Computer Physics Communications",

issn = "0010-4655",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Implementation and optimisation of the cdugksFoam solver on the Sunway TaihuLight supercomputer

AU - Guo, Jie

AU - Wang, Yunlan

AU - Zhang, Rui

AU - Zhang, Feifei

AU - Zhao, Tianhai

AU - Zhuo, Congshan

AU - Liu, Sha

AU - Zhong, Chengwen

PY - 2025/3

Y1 - 2025/3

N2 - In this study, the cdugksFoam solver was successfully implemented on the Sunway TaihuLight system using the MPI + Athread programming model. To utilise the heterogeneous SW26010 many-core processor fully, we implemented three levels of parallelisation: MPI process-level hybrid parallelisation in physical space and velocity space, thread-level parallelisation to further partition physical space, and single-instruction multiple-data (SIMD) vectorisation. To address the performance bottleneck caused by the low memory bandwidth of the SW26010 processor, a series of optimisation methods, including kernel fusion, transcendental function optimisation, and soft cache, were designed and implemented to reduce discrete memory access and improve the computational efficiency of CPEs. The accuracy of the optimised program was validated through simulations of the 3D lid-driven cavity flow and rarefied supersonic flow past a sphere. Based on the experimental results from multiple grid scales, the overall performance achieved an acceleration of over 5 times compared to running on the management processing elements. In both strong and weak scalability tests, a parallel efficiency exceeding 90% was achieved.

AB - In this study, the cdugksFoam solver was successfully implemented on the Sunway TaihuLight system using the MPI + Athread programming model. To utilise the heterogeneous SW26010 many-core processor fully, we implemented three levels of parallelisation: MPI process-level hybrid parallelisation in physical space and velocity space, thread-level parallelisation to further partition physical space, and single-instruction multiple-data (SIMD) vectorisation. To address the performance bottleneck caused by the low memory bandwidth of the SW26010 processor, a series of optimisation methods, including kernel fusion, transcendental function optimisation, and soft cache, were designed and implemented to reduce discrete memory access and improve the computational efficiency of CPEs. The accuracy of the optimised program was validated through simulations of the 3D lid-driven cavity flow and rarefied supersonic flow past a sphere. Based on the experimental results from multiple grid scales, the overall performance achieved an acceleration of over 5 times compared to running on the management processing elements. In both strong and weak scalability tests, a parallel efficiency exceeding 90% was achieved.

KW - Discrete unified gas kinetic scheme

KW - High-performance computing

KW - OpenFOAM

KW - Sunway TaihuLight supercomputer

UR - http://www.scopus.com/inward/record.url?scp=85210270214&partnerID=8YFLogxK

U2 - 10.1016/j.cpc.2024.109455

DO - 10.1016/j.cpc.2024.109455

M3 - 文章

AN - SCOPUS:85210270214

SN - 0010-4655

VL - 308

JO - Computer Physics Communications

JF - Computer Physics Communications

M1 - 109455

ER -

Implementation and optimisation of the cdugksFoam solver on the Sunway TaihuLight supercomputer

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this