应用气象学报
應用氣象學報
응용기상학보
QUARTERLY JOURNAL OF APPLIED METEOROLOGY
2014年
5期
581-591
,共11页
混合并行%数值天气预报模式%区域分解%循环并行
混閤併行%數值天氣預報模式%區域分解%循環併行
혼합병행%수치천기예보모식%구역분해%순배병행
hybrid parallel%numerical weather forecasting model%domain decomposition%loop-level paral-lelization
随着多核计算技术的发展,基于多核处理器的集群系统逐渐成为主流架构。为适应这种既有分布式又有共享内存的硬件体系架构,使用 MPI 与 OpenMP 混合编程模型,可以实现节点间和节点内两级并行,利用消息传递与共享并行处理两种编程方式,MPI 用于节点间通信,OpenMP 用于节点内并行计算。该文采用 MPI 与 OpenMP 混合并行模型,使用区域分解并行和循环并行两种方法,对 GRAPES 全球模式进行 MPI 与 OpenMP 混合并行方案设计和优化。试验结果表明:MPI 与 OpenMP 混合并行方法可以在 MPI 并行的基础上提高模式的并行度,在计算核数相同的情况下,4个线程内的 MPI 与 OpenMP 混合并行方案比单一 MPI 方案效果好,但在线程数量大于4时,并行效果显著下降。
隨著多覈計算技術的髮展,基于多覈處理器的集群繫統逐漸成為主流架構。為適應這種既有分佈式又有共享內存的硬件體繫架構,使用 MPI 與 OpenMP 混閤編程模型,可以實現節點間和節點內兩級併行,利用消息傳遞與共享併行處理兩種編程方式,MPI 用于節點間通信,OpenMP 用于節點內併行計算。該文採用 MPI 與 OpenMP 混閤併行模型,使用區域分解併行和循環併行兩種方法,對 GRAPES 全毬模式進行 MPI 與 OpenMP 混閤併行方案設計和優化。試驗結果錶明:MPI 與 OpenMP 混閤併行方法可以在 MPI 併行的基礎上提高模式的併行度,在計算覈數相同的情況下,4箇線程內的 MPI 與 OpenMP 混閤併行方案比單一 MPI 方案效果好,但在線程數量大于4時,併行效果顯著下降。
수착다핵계산기술적발전,기우다핵처리기적집군계통축점성위주류가구。위괄응저충기유분포식우유공향내존적경건체계가구,사용 MPI 여 OpenMP 혼합편정모형,가이실현절점간화절점내량급병행,이용소식전체여공향병행처리량충편정방식,MPI 용우절점간통신,OpenMP 용우절점내병행계산。해문채용 MPI 여 OpenMP 혼합병행모형,사용구역분해병행화순배병행량충방법,대 GRAPES 전구모식진행 MPI 여 OpenMP 혼합병행방안설계화우화。시험결과표명:MPI 여 OpenMP 혼합병행방법가이재 MPI 병행적기출상제고모식적병행도,재계산핵수상동적정황하,4개선정내적 MPI 여 OpenMP 혼합병행방안비단일 MPI 방안효과호,단재선정수량대우4시,병행효과현저하강。
Clustered SMP systems are gradually becoming more prominence,as advances in multi-core technolo-gy which allows larger numbers of CPUs to have access to a single memory space.To take advantage of benefits of this hardware architecture that combines both distributed and shared memory,utilizing hybrid MPI and OpenMP parallel programming model is a good trial.This hierarchical programming model can a-chieve both inter-node and intra-node parallelization by using a combination of message passing and thread based shared memory parallelization paradigms within the same application.MPI is used to coarse-grained communicate between SMP nodes and OpenMP based on threads is used to fine-grained compute within a SMP node. <br> As a large-scale computing and storage-intensive typical numerical weather forecasting application, GRAPES (Global/Regional Assimilation and PrEdictions System)has been developed into MPI version and put into operational use.To adapt to SMP cluster systems and achieve higher scalability,a hybrid MPI and OpenMP parallel model suitable for GRPAES Global model is developed with the introduction of hori-zontal domain decomposition method and loop-level parallelization.In horizontal domain decomposition method,a patch is uniformly divided into several tiles while patches are obtained by dividing the whole forecasting domain.There are two main advantages in performing parallel operations on tiles.Firstly,tile-level parallelization which applies OpenMP at a high level,to some extent,is coarse grained parallelism. Compared to computing work associated with each tile,OpenMP thread overhead is negligible.Secondly, implementation of this method is relative simple,and the subroutine thread safety is the only thing to en-sure.Loop-level parallelization which can improve load imbalance by adopting different thread scheduling policies is fine grained parallelism.The main computational loops are applied OpenMP’s parallel directives in loop-level parallelization method.The preferred method is horizontal domain decomposition for uniform grid computing,while loop-level parallelization method is preferred for non-uniform grid computing and the thread unsafe procedures.Experiments with 1°×1°dataset are performed and timing on main subrou-tines of integral computation are compared.The hybrid parallel performance is superior to single MPI scheme in terms of long wave radiation process,microphysics and land surface process while Helmholtz e-quation generalized conjugate residual (GCR)solution has some difficulty in thread parallelism for incom-plete LU factorization preconditioner part.ILU part with tile-level parallelization can improve GCR’s hy-brid parallelization.Short wave process hybrid parallel performance is close to single MPI scheme under the same computing cores.It requires less elapsed time with increase of the number of threads under cer-tain MPI processes in hybrid parallel scheme.And hybrid parallel scheme within four threads is superior to single MPI scheme under large-scale experiment.Hybrid parallel scheme can also achieve better scalability than single MPI scheme.The experiment shows hybrid MPI and OpenMP parallel scheme is suitable for GRAPES Global model.