CAJ | 학술논문

多核与众核已成为当前主流的高性能计算体系结构，OpenMP编程是开发其并行计算能力的主要手段之一。针对一个实际高阶精度结构网格CFD（computational fluids dynamics）应用程序，采用基于硬件计数器的性能测试和模型分析的方法，系统地研究了其在Intel Xeon E5 Sandy Bridge多核处理器和Intel Knights Corner集成众核协处理器上的OpenMP性能。重点分析了OpenMP库开销、线程负载均衡性、主存访问带宽对性能的影响，发现因OpenMP并行引入的冗余计算对并行效率影响很小，但串行计算部分和负载不均衡性对并行效率影响大，主存访问带宽对浮点性能的影响大。还比较了该程序两种体系结构上的性能差异，讨论了性能进一步优化的方向。
다핵여음핵이성위당전주류적고성능계산체계결구，OpenMP편정시개발기병행계산능력적주요수단지일。침대일개실제고계정도결구망격CFD（computational fluids dynamics）응용정서，채용기우경건계수기적성능측시화모형분석적방법，계통지연구료기재Intel Xeon E5 Sandy Bridge다핵처리기화Intel Knights Corner집성음핵협처리기상적OpenMP성능。중점분석료OpenMP고개소、선정부재균형성、주존방문대관대성능적영향，발현인OpenMP병행인입적용여계산대병행효솔영향흔소，단천행계산부분화부재불균형성대병행효솔영향대，주존방문대관대부점성능적영향대。환비교료해정서량충체계결구상적성능차이，토론료성능진일보우화적방향。
Multicore and manycore are becoming mainstream architectures in high performance computing. OpenMP programming is one of the primary methods to exploit the parallel computing capabilities of them. By using a sys-tematic approach which incorporates hardware performance counter based measurement and model based analysis, this paper evaluates the OpenMP performance of a real-world high order structured grids based CFD (computational fluids dynamics) application on Xeon E5 Sandy Bridge, an Intel multicore processor, and Knights Corner, an Intel many integrated core coprocessor. This paper analyzes the performance impacts of the OpenMP library cost, the load balance among different OpenMP threads, and the memory bandwidth to the application. The results show that the redundant computation introduced by OpenMP parallel programming is not significant. The serial portion and the load imbalance significantly affect the parallel efficiency. And memory access bandwidth significantly affects the achieved floating point performance. This paper also compares the performance differences between two archi-tectures and discusses the directions of further performance tuning.