计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2015年
9期
1084-1092
,共9页
赵莲%赵永华%陈尧%赵慰
趙蓮%趙永華%陳堯%趙慰
조련%조영화%진요%조위
近似逆%预条件%迭代法%异构并行计算%GPU集群
近似逆%預條件%迭代法%異構併行計算%GPU集群
근사역%예조건%질대법%이구병행계산%GPU집군
approximate inverse%preconditioner%iterative method%heterogeneous parallel computing%GPU cluster
针对GPU集群系统,研究了分解近似逆(approximate inverse,AINV)和对称逐次超松弛-近似逆(sym-metric successive over relaxation approximate inverse,SSOR-AI)两类近似逆预条件的并行算法。采用多级k-路图划分方法,通过子图的内点和边界点识别方法以及稀疏矩阵的置换技术,提出了将稀疏矩阵转换为分块箭形矩阵的并行方法。基于所形成的分块箭形矩阵,结合块内稀疏矩阵近似逆串行、块间并行的策略给出了近似逆预条件的并行方法,实现了AINV和SSOR-AI并行算法,解决了AINV预条件难以并行的问题。基于CPU与GPU协同计算、主机端页锁定内存和设备端计算与通信重叠的优化技术,实现了并行近似逆预条件与共轭梯度(conjugate gradient,CG)算法相结合的线性方程组混合并行求解器。数值实验表明,所提方法对AINV和SSOR-AI两类近似逆预条件,在多GPU上获得了很好的可扩展性和加速效果。
針對GPU集群繫統,研究瞭分解近似逆(approximate inverse,AINV)和對稱逐次超鬆弛-近似逆(sym-metric successive over relaxation approximate inverse,SSOR-AI)兩類近似逆預條件的併行算法。採用多級k-路圖劃分方法,通過子圖的內點和邊界點識彆方法以及稀疏矩陣的置換技術,提齣瞭將稀疏矩陣轉換為分塊箭形矩陣的併行方法。基于所形成的分塊箭形矩陣,結閤塊內稀疏矩陣近似逆串行、塊間併行的策略給齣瞭近似逆預條件的併行方法,實現瞭AINV和SSOR-AI併行算法,解決瞭AINV預條件難以併行的問題。基于CPU與GPU協同計算、主機耑頁鎖定內存和設備耑計算與通信重疊的優化技術,實現瞭併行近似逆預條件與共軛梯度(conjugate gradient,CG)算法相結閤的線性方程組混閤併行求解器。數值實驗錶明,所提方法對AINV和SSOR-AI兩類近似逆預條件,在多GPU上穫得瞭很好的可擴展性和加速效果。
침대GPU집군계통,연구료분해근사역(approximate inverse,AINV)화대칭축차초송이-근사역(sym-metric successive over relaxation approximate inverse,SSOR-AI)량류근사역예조건적병행산법。채용다급k-로도화분방법,통과자도적내점화변계점식별방법이급희소구진적치환기술,제출료장희소구진전환위분괴전형구진적병행방법。기우소형성적분괴전형구진,결합괴내희소구진근사역천행、괴간병행적책략급출료근사역예조건적병행방법,실현료AINV화SSOR-AI병행산법,해결료AINV예조건난이병행적문제。기우CPU여GPU협동계산、주궤단혈쇄정내존화설비단계산여통신중첩적우화기술,실현료병행근사역예조건여공액제도(conjugate gradient,CG)산법상결합적선성방정조혼합병행구해기。수치실험표명,소제방법대AINV화SSOR-AI량류근사역예조건,재다GPU상획득료흔호적가확전성화가속효과。
This paper shows the study on the parallel algorithm of AINV (approximate inverse) and SSOR-AI (sym-metric successive over relaxation approximate inverse) preconditioners on GPU cluster systems. With multilevel k-way graph partitioning, this paper proposes the parallel method which can transform a sparse matrix into block arrow form based on a method to identify interior/boundary vertex of subgraphs and a permutation. Based on the block arrow matrix, with the strategy of sequential computation approximate inverse of inner block and parallel computation between the different blocks, the parallel algorithm of AINV and SSOR-AI is obtained. Based on the optimization techniques of collaborative computing between CPU and GPU, page-locked host memory and overlapping transfers with computation on device, this paper combines parallel approximate inverse preconditioner with CG (conjugate gradient) algorithm to obtain a hybrid parallel solver for linear systems. Numerical experiments indicate that applying the above methods can obtain very good acceleration effect and scalability both AINV parallel implementation and SSOR-AI parallel implementation on cluster-GPU.