计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2015年
2期
189-192,198
,共5页
K-近邻问题%图形处理器%并行计算%算法加速%合并访问%全局存储器
K-近鄰問題%圖形處理器%併行計算%算法加速%閤併訪問%全跼存儲器
K-근린문제%도형처리기%병행계산%산법가속%합병방문%전국존저기
K-nearest Neighbor ( KNN ) problem%Graphics Processing Unit ( GPU )%parallel computing%algorithm acceleration%coalesced access%global memory
K-近邻计算在数据集规模较大时计算复杂度较高,因此,利用图形处理器( GPU )强大的并行计算能力对K-近邻算法进行加速。在分析现有K-近邻算法的基础上,针对该算法时间开销过大的问题,结合GPU的体系结构特征实现基于GPU的K-近邻算法。利用全局存储器的合并访问特性,提高GPU全局存储器访问数据的效率,通过事先过滤数据的方法来减少参与排序的数据量,进而减少排序阶段的线程串行化时间。在 KDD, Poker, Covertype 3个数据集上进行实验,结果表明,该实现方法在距离计算阶段每秒执行的浮点运算次数为266.37×109次,而排序阶段为26.47×109次,优于已有方法。
K-近鄰計算在數據集規模較大時計算複雜度較高,因此,利用圖形處理器( GPU )彊大的併行計算能力對K-近鄰算法進行加速。在分析現有K-近鄰算法的基礎上,針對該算法時間開銷過大的問題,結閤GPU的體繫結構特徵實現基于GPU的K-近鄰算法。利用全跼存儲器的閤併訪問特性,提高GPU全跼存儲器訪問數據的效率,通過事先過濾數據的方法來減少參與排序的數據量,進而減少排序階段的線程串行化時間。在 KDD, Poker, Covertype 3箇數據集上進行實驗,結果錶明,該實現方法在距離計算階段每秒執行的浮點運算次數為266.37×109次,而排序階段為26.47×109次,優于已有方法。
K-근린계산재수거집규모교대시계산복잡도교고,인차,이용도형처리기( GPU )강대적병행계산능력대K-근린산법진행가속。재분석현유K-근린산법적기출상,침대해산법시간개소과대적문제,결합GPU적체계결구특정실현기우GPU적K-근린산법。이용전국존저기적합병방문특성,제고GPU전국존저기방문수거적효솔,통과사선과려수거적방법래감소삼여배서적수거량,진이감소배서계단적선정천행화시간。재 KDD, Poker, Covertype 3개수거집상진행실험,결과표명,해실현방법재거리계산계단매초집행적부점운산차수위266.37×109차,이배서계단위26.47×109차,우우이유방법。
K-nearest Neighbor( KNN) is a classical problem whose computational complexity increases rapidly with the size of data set. It is an interesting research to accelerate KNN implementation on the Graphics Processor Unit( GPU) by employing GPU’ s massive parallel computing power. For its heavy overhead on time,after analyzing the existing work of GPU-based KNN implementations and the architectural features of GPU, this paper efficiently parallelizes KNN on the GPU. It optimizes data access by making good use of the coalesced access power of global memory,and reduces thread serialization by filtering out as much data as possible in advance that is to be sorted. Experiments on KDD,Poker and Covertype datasets and comparisons with some existing methods show that the number of floating point arithmetic of executed per second of this distance computing method is up to 266. 37 × 109 ,and is up to 26. 47 × 109 in sort phase, which are superior to that of existed methods.