计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
10期
1162-1176
,共15页
吴再龙%张云泉%徐建良%贾海鹏%颜深根%解庆春
吳再龍%張雲泉%徐建良%賈海鵬%顏深根%解慶春
오재룡%장운천%서건량%가해붕%안심근%해경춘
OpenCL%并行计算%Kmeans%迭代算法%跨平台
OpenCL%併行計算%Kmeans%迭代算法%跨平檯
OpenCL%병행계산%Kmeans%질대산법%과평태
OpenCL%parallel computing%Kmeans%iterative algorithm%cross-platform
Kmeans算法是无监督机器学习中一种典型的聚类算法,是对已知数据集进行划分和分组的重要方法,在图像处理、数据挖掘、生物学领域有着广泛的应用。随着实际应用中数据规模的不断变大,对Kmeans算法的性能也提出了更高的要求。在充分考虑不同硬件平台体系架构差异的基础上,系统地研究了Kmeans算法在GPU和APU平台上实现与优化的关键技术:片上全局同步高效实现,冗余计算减少全局同步次数,线程任务重映射,局部内存重用等,实现了Kmeans算法在不同硬件平台上的高性能与性能移植。实验结果表明,优化后的算法在考虑数据传输时间的前提下,在AMD HD7970 GPU上相对于CPU版本取得136.975~170.333倍的加速比,在AMD A10-5800K APU上相对于CPU版本取得22.2365~24.3865倍的加速比,有效验证了优化方法的有效性和平台的可移植性。
Kmeans算法是無鑑督機器學習中一種典型的聚類算法,是對已知數據集進行劃分和分組的重要方法,在圖像處理、數據挖掘、生物學領域有著廣汎的應用。隨著實際應用中數據規模的不斷變大,對Kmeans算法的性能也提齣瞭更高的要求。在充分攷慮不同硬件平檯體繫架構差異的基礎上,繫統地研究瞭Kmeans算法在GPU和APU平檯上實現與優化的關鍵技術:片上全跼同步高效實現,冗餘計算減少全跼同步次數,線程任務重映射,跼部內存重用等,實現瞭Kmeans算法在不同硬件平檯上的高性能與性能移植。實驗結果錶明,優化後的算法在攷慮數據傳輸時間的前提下,在AMD HD7970 GPU上相對于CPU版本取得136.975~170.333倍的加速比,在AMD A10-5800K APU上相對于CPU版本取得22.2365~24.3865倍的加速比,有效驗證瞭優化方法的有效性和平檯的可移植性。
Kmeans산법시무감독궤기학습중일충전형적취류산법,시대이지수거집진행화분화분조적중요방법,재도상처리、수거알굴、생물학영역유착엄범적응용。수착실제응용중수거규모적불단변대,대Kmeans산법적성능야제출료경고적요구。재충분고필불동경건평태체계가구차이적기출상,계통지연구료Kmeans산법재GPU화APU평태상실현여우화적관건기술:편상전국동보고효실현,용여계산감소전국동보차수,선정임무중영사,국부내존중용등,실현료Kmeans산법재불동경건평태상적고성능여성능이식。실험결과표명,우화후적산법재고필수거전수시간적전제하,재AMD HD7970 GPU상상대우CPU판본취득136.975~170.333배적가속비,재AMD A10-5800K APU상상대우CPU판본취득22.2365~24.3865배적가속비,유효험증료우화방법적유효성화평태적가이식성。
As a typical clustering algorithm and an important method to data decomposition and packet processing, Kmeans algorithm is widely used in image processing, machine learning and biology, etc. Due to the constant expan-sion on data set, Kmeans is facing more and more demand on its performance. Having taken into full account the difference between hardware platforms and architectures, this paper conducts a systematic research on achieving Kmeans algorithm efficiently running on GPU and APU platforms based on OpenCL. And with the help of several optimization methods, such as the implementation of iterative algorithm with multiple global synchronization in GPU, the reduction on global synchronization by redundant computation, the redistribution on thread task, the reuseof local memory, etc, Kmeans algorithm achieves high efficient implementation on different hardware architectures and the optimization methods suitable for iterative algorithm are summed up. The experimental results show that the optimized algorithm gets 136.975~170.333 times speedup on AMD HD7970 GPU than the CPU version (with con-sidering the data transfer time) and gets 22.2365~24.3865 times speedup on AMD A10-5800K APU than the CPU version, which effectively verifies the validity and the cross-platform ability of the optimization methods proposed in this paper.