软件学报
軟件學報
연건학보
JOURNAL OF SOFTWARE
2014年
6期
1154-1168
,共15页
刘晓娴%赵荣彩%赵捷%徐金龙
劉曉嫻%趙榮綵%趙捷%徐金龍
류효한%조영채%조첩%서금룡
流水并行%自动并行%DOACROSS循环%代价模型
流水併行%自動併行%DOACROSS循環%代價模型
류수병행%자동병행%DOACROSS순배%대개모형
pipeline parallel%automatic parallelization%DOACROSS loops%cost model
发掘DOACROSS循环中蕴含的并行性,选择合适的策略将其并行执行,对提升程序的并行性能非常重要。流水并行方式是规则 DOACROSS 循环并行的重要方式。自动生成性能良好的流水并行代码是一项困难的工作,并行编译器对程序自动并行时常常对DOACROSS循环作保守处理,损失了DOACROSS循环包含的并行性,限制了程序的并行性能。针对上述问题,设计了一种选择计算划分循环层和循环分块层的启发式算法,给出了一个基于流水并行代价模型的循环分块大小计算公式,并使用计数信号量进行并行线程之间的同步,实现了基于 OpenMP 的规则DOACROSS 循环流水并行代码的自动生成。通过对有限差分松弛法(finite difference relaxation,简称 FDR)的波前(wavefront)循环和时域有限差分法(finite difference time domain,简称 FDTD)中典型循环以及程序 Poisson,LU 和Jacobi 的测试,算法自动生成的流水并行代码能够在多核处理器上获得明显的性能提升,使用的流水分块大小计算公式能够较为精确地计算出循环流水并行时的最佳分块大小。自动生成的流水并行代码与基于手工选择的最优分块大小的流水并行代码相比,加速比达到手工选择加速比的89%。
髮掘DOACROSS循環中蘊含的併行性,選擇閤適的策略將其併行執行,對提升程序的併行性能非常重要。流水併行方式是規則 DOACROSS 循環併行的重要方式。自動生成性能良好的流水併行代碼是一項睏難的工作,併行編譯器對程序自動併行時常常對DOACROSS循環作保守處理,損失瞭DOACROSS循環包含的併行性,限製瞭程序的併行性能。針對上述問題,設計瞭一種選擇計算劃分循環層和循環分塊層的啟髮式算法,給齣瞭一箇基于流水併行代價模型的循環分塊大小計算公式,併使用計數信號量進行併行線程之間的同步,實現瞭基于 OpenMP 的規則DOACROSS 循環流水併行代碼的自動生成。通過對有限差分鬆弛法(finite difference relaxation,簡稱 FDR)的波前(wavefront)循環和時域有限差分法(finite difference time domain,簡稱 FDTD)中典型循環以及程序 Poisson,LU 和Jacobi 的測試,算法自動生成的流水併行代碼能夠在多覈處理器上穫得明顯的性能提升,使用的流水分塊大小計算公式能夠較為精確地計算齣循環流水併行時的最佳分塊大小。自動生成的流水併行代碼與基于手工選擇的最優分塊大小的流水併行代碼相比,加速比達到手工選擇加速比的89%。
발굴DOACROSS순배중온함적병행성,선택합괄적책략장기병행집행,대제승정서적병행성능비상중요。류수병행방식시규칙 DOACROSS 순배병행적중요방식。자동생성성능량호적류수병행대마시일항곤난적공작,병행편역기대정서자동병행시상상대DOACROSS순배작보수처리,손실료DOACROSS순배포함적병행성,한제료정서적병행성능。침대상술문제,설계료일충선택계산화분순배층화순배분괴층적계발식산법,급출료일개기우류수병행대개모형적순배분괴대소계산공식,병사용계수신호량진행병행선정지간적동보,실현료기우 OpenMP 적규칙DOACROSS 순배류수병행대마적자동생성。통과대유한차분송이법(finite difference relaxation,간칭 FDR)적파전(wavefront)순배화시역유한차분법(finite difference time domain,간칭 FDTD)중전형순배이급정서 Poisson,LU 화Jacobi 적측시,산법자동생성적류수병행대마능구재다핵처리기상획득명현적성능제승,사용적류수분괴대소계산공식능구교위정학지계산출순배류수병행시적최가분괴대소。자동생성적류수병행대마여기우수공선택적최우분괴대소적류수병행대마상비,가속비체도수공선택가속비적89%。
To obtain as much performance improvement as possible for sequential applications, it is important to exploit parallelism lurking in DOACROSS loops and find good schemes for their parallel execution. Pipelining is such a parallelizing method which can work well for regular DOACROSS loops. However, it is so hard to maintain high performance pipeline parallel codes automatically that parallel compilers always treat DOACROSS loops conservatively. Compilers usually serialize DOACROSS loops, which loses the inherent parallelism of DOACROSS loops and affects the performance of generated parallel programs. To solve this problem, automatic generation of pipeline parallel code for regular DOACROSS loops is implemented for multicore platform based on OpenMP. Firstly, a heuristic is proposed to choose the partition loop and the tiling loop of regular DOACROSS loops. Secondly, a formula based on pipelining cost model is given to compute the optimal tiling size. Lastly, the synchronization between threads is implemented with counter semaphores. Measuring against the wavefront loops of finite difference relaxation, the representative loops of finite difference time domain, and Poisson, LU and Jacobi procedures, the pipeline parallel loops automatically generated by the proposed method increase execution efficiency on multicore platform with the average speedup up to 89%of the optimal speedup obtained manually.