科研信息化技术与应用
科研信息化技術與應用
과연신식화기술여응용
E-science Technology & Application
2013年
1期
57-66
,共10页
吴再龙%张云泉%龙国平%徐建良%贾海鹏
吳再龍%張雲泉%龍國平%徐建良%賈海鵬
오재룡%장운천%룡국평%서건량%가해붕
OpenCL%通用计算%图像重映射算法%跨平台
OpenCL%通用計算%圖像重映射算法%跨平檯
OpenCL%통용계산%도상중영사산법%과평태
OpenCL%Parallel computing%Image remap%Cross-platform
图像重映射(Remap)算法是典型的图像变化算法。在图像放缩、扭曲、旋转等领域有着广泛的应用。随着图片规模和分辨率的不断提高,对图形映射算法的性能提出了越来越高的要求。本文在充分考虑不同GPU平台硬件体系结构差异的基础上,系统研究了在OpenCL框架下图像映射(Remap)算法在不同GPU平台上的高效实现方式。并从片外内存访存优化,向量化计算,减少动态指令等多个优化角度考察了不同优化方法在不同GPU平台上对性能的影响,提出了在不同GPU平台间实现性能移植的可能性。实验结果表明,优化后的算法在不考虑数据传输时间的前提下,在AMD HD5850 GPU上相对于CPU版本取得114.3~491.5倍的加速比,相对于CUDA版本(现有GPU算法的实现)得到1.01~1.86的加速比,在NIVIDIA C2050 GPU上相对CPU版本取得100.7~369.8倍的加速比,相对于CUDA版本得到0.95~1.58的加速比。有效验证了本文提出的优化方法的有效性和性能可移植性。
圖像重映射(Remap)算法是典型的圖像變化算法。在圖像放縮、扭麯、鏇轉等領域有著廣汎的應用。隨著圖片規模和分辨率的不斷提高,對圖形映射算法的性能提齣瞭越來越高的要求。本文在充分攷慮不同GPU平檯硬件體繫結構差異的基礎上,繫統研究瞭在OpenCL框架下圖像映射(Remap)算法在不同GPU平檯上的高效實現方式。併從片外內存訪存優化,嚮量化計算,減少動態指令等多箇優化角度攷察瞭不同優化方法在不同GPU平檯上對性能的影響,提齣瞭在不同GPU平檯間實現性能移植的可能性。實驗結果錶明,優化後的算法在不攷慮數據傳輸時間的前提下,在AMD HD5850 GPU上相對于CPU版本取得114.3~491.5倍的加速比,相對于CUDA版本(現有GPU算法的實現)得到1.01~1.86的加速比,在NIVIDIA C2050 GPU上相對CPU版本取得100.7~369.8倍的加速比,相對于CUDA版本得到0.95~1.58的加速比。有效驗證瞭本文提齣的優化方法的有效性和性能可移植性。
도상중영사(Remap)산법시전형적도상변화산법。재도상방축、뉴곡、선전등영역유착엄범적응용。수착도편규모화분변솔적불단제고,대도형영사산법적성능제출료월래월고적요구。본문재충분고필불동GPU평태경건체계결구차이적기출상,계통연구료재OpenCL광가하도상영사(Remap)산법재불동GPU평태상적고효실현방식。병종편외내존방존우화,향양화계산,감소동태지령등다개우화각도고찰료불동우화방법재불동GPU평태상대성능적영향,제출료재불동GPU평태간실현성능이식적가능성。실험결과표명,우화후적산법재불고필수거전수시간적전제하,재AMD HD5850 GPU상상대우CPU판본취득114.3~491.5배적가속비,상대우CUDA판본(현유GPU산법적실현)득도1.01~1.86적가속비,재NIVIDIA C2050 GPU상상대CPU판본취득100.7~369.8배적가속비,상대우CUDA판본득도0.95~1.58적가속비。유효험증료본문제출적우화방법적유효성화성능가이식성。
As a typical algorithm for image transformation, remap algorithm is widely used in image zooming, warping, rotating and some others. With continuous increase of image’s scale and resolution, higher performance of graphic mapping algorithm has been more and more demanded. Taking full account of the differences of the hardware architectures on different GPU platforms, it is systematically studied in this paper that how remap algorithm based on OpenCL can run effectively on different GPU platforms. By applying memory access optimization of global memory, vectorization calculation, reducing judgments branch and some other optimization methods, we investigated the effects of different optimization on different platforms and suggested the possibility of realizing cross-platform portability. Experimental results showed that without counting the data transfer time, the speedup-ratio is 114.3~491.5 times for AMD HD5850 GPU to CPU version, and 1.01~1.86 times to CUDA version (with present GPU algorithm), and for NIVIDIA C2050 GPU, the speedup-ratio is 100.7~369.8 times to CPU and 0.95~1.58 times to CUDA. These well proved the validity and portability of the optimization methods proposed in this paper.