计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
3期
22-29,116
,共9页
并行计算%累积汇流%图形处理器%开放计算语言
併行計算%纍積彙流%圖形處理器%開放計算語言
병행계산%루적회류%도형처리기%개방계산어언
parallel computing%flow accumulation%Graphics Processing Unit(GPU)%Open Computing Language(OpenCL)
大尺度、高分辨率数字地形数据应用需求的增长,给计算密集型的累积汇流等数字地形分析算法带来了新的挑战。针对CPU/GPU(Graphics Processing Unit)异构计算平台的特点,提出了一种基于OpenCL(Open Computing Language)的多流向累积汇流算法的并行化策略,具有更好的平台独立性和可移植性,简化了CPU/GPU异构平台下的并行应用程序设计。累积汇流并行算法包括时空独立型的流量分配和空间依赖型的累积入流两个过程,均定义为OpenCL内核并交由OpenCL设备并行执行,其中累积入流过程借助流量转移矩阵由递归式转换为迭代式来实现并行计算。与基于流量转移矩阵的并行汇流算法相比,尽管基于单元入度矩阵的并行汇流算法可以降低迭代过程中的计算冗余,但需要采用具有较大延迟的原子操作以及需要更多的迭代次数,在有限的GPU计算资源下,两种算法性能差异不明显。实验结果表明,并行累积汇流算法在NVIDIA GeForce GT 650M GPU上获得了较好的加速比,加速性能随格网尺度增加而有所增加,其中流量分配获得了约50~70倍的加速比,累积入流获得了10~20倍的加速比,展示了利用OpenCL在GPU等并行计算设备上进行大规模数字地形分析的潜在优势。
大呎度、高分辨率數字地形數據應用需求的增長,給計算密集型的纍積彙流等數字地形分析算法帶來瞭新的挑戰。針對CPU/GPU(Graphics Processing Unit)異構計算平檯的特點,提齣瞭一種基于OpenCL(Open Computing Language)的多流嚮纍積彙流算法的併行化策略,具有更好的平檯獨立性和可移植性,簡化瞭CPU/GPU異構平檯下的併行應用程序設計。纍積彙流併行算法包括時空獨立型的流量分配和空間依賴型的纍積入流兩箇過程,均定義為OpenCL內覈併交由OpenCL設備併行執行,其中纍積入流過程藉助流量轉移矩陣由遞歸式轉換為迭代式來實現併行計算。與基于流量轉移矩陣的併行彙流算法相比,儘管基于單元入度矩陣的併行彙流算法可以降低迭代過程中的計算冗餘,但需要採用具有較大延遲的原子操作以及需要更多的迭代次數,在有限的GPU計算資源下,兩種算法性能差異不明顯。實驗結果錶明,併行纍積彙流算法在NVIDIA GeForce GT 650M GPU上穫得瞭較好的加速比,加速性能隨格網呎度增加而有所增加,其中流量分配穫得瞭約50~70倍的加速比,纍積入流穫得瞭10~20倍的加速比,展示瞭利用OpenCL在GPU等併行計算設備上進行大規模數字地形分析的潛在優勢。
대척도、고분변솔수자지형수거응용수구적증장,급계산밀집형적루적회류등수자지형분석산법대래료신적도전。침대CPU/GPU(Graphics Processing Unit)이구계산평태적특점,제출료일충기우OpenCL(Open Computing Language)적다류향루적회류산법적병행화책략,구유경호적평태독립성화가이식성,간화료CPU/GPU이구평태하적병행응용정서설계。루적회류병행산법포괄시공독립형적류량분배화공간의뢰형적루적입류량개과정,균정의위OpenCL내핵병교유OpenCL설비병행집행,기중루적입류과정차조류량전이구진유체귀식전환위질대식래실현병행계산。여기우류량전이구진적병행회류산법상비,진관기우단원입도구진적병행회류산법가이강저질대과정중적계산용여,단수요채용구유교대연지적원자조작이급수요경다적질대차수,재유한적GPU계산자원하,량충산법성능차이불명현。실험결과표명,병행루적회류산법재NVIDIA GeForce GT 650M GPU상획득료교호적가속비,가속성능수격망척도증가이유소증가,기중류량분배획득료약50~70배적가속비,루적입류획득료10~20배적가속비,전시료이용OpenCL재GPU등병행계산설비상진행대규모수자지형분석적잠재우세。
The growing demand for the applications of large scale and high resolution digital terrain data has brought new challenges to computationally intensive digital terrain analysis algorithms such as flow accumulation. According to the characteristics of heterogeneous computing platform with CPU/GPU(Graphics Processing Unit), a parallelization strategy for multiple flow direction flow accumulation algorithm is put forward based on the OpenCL(Open Computing Language). It has better platform independence and portability, which simplifies the programming for parallel computing under CPU/GPU heterogeneous platform. The parallel flow accumulation algorithm includes outflow allocation process independently with the space and time domain, and the inflow accumulation process depending on the space domain. The two processes are defined as OpenCL kernels and are executed parallelly on the OpenCL devices. The transfer matrix is used to transfer the recursive inflow accumulation process into iterative style for parallel computing. Compared with the parallel flow accumulation algorithm based on flow transfer matrix, the parallel flow accumulation algorithm based on indegree matrix with graph theory can reduce the computation redundancy in the iterative inflow accumulation process, but it requires atomic operations with large delay and more iterations. With limited GPU computing resources, the two parallel flow accumulation algorithms have no obvious differences in speedup performance. Experimental results show that the parallel flow accumulation algorithm obtains a good speedup on NVIDIA GeForce GT 650M GPU and the speedup is increased gradually with the increase of grid scale. The speedups are 50~70 for the outflow allocation process and 10~20 for the inflow accumulation process, which demonstrates the potential advantages of large scale digital terrain analysis on parallel com-puting devices such as GPU with OpenCL.