电子学报
電子學報
전자학보
ACTA ELECTRONICA SINICA
2015年
8期
1642-1650
,共9页
刘书勇%吴艳霞%张博为%张国印%戴葵
劉書勇%吳豔霞%張博為%張國印%戴葵
류서용%오염하%장박위%장국인%대규
矩阵三角化分解%三角化过程%并行算法%LU 分解%现场可编程门阵列
矩陣三角化分解%三角化過程%併行算法%LU 分解%現場可編程門陣列
구진삼각화분해%삼각화과정%병행산법%LU 분해%현장가편정문진렬
matrix triangularization decomposition%triangularization process%parallel algorithm%LU decomposition%field programmable gate array
可重构计算系统成为加速计算密集型应用的重要选择之一。在众多受到关注的计算密集型问题中,矩阵三角化分解作为典型的基础类应用始终处于研究的核心地位,在求解线性方程组、求矩阵特征值等科学与工程问题中有重要的研究价值。本文面向矩阵三角化分解中共有的三角化计算过程,通过分析该过程的线性计算规律,提出一种适于硬件并行实现的子矩阵更新同一化算法及矩阵三角化计算 FPGA (Field Programmable Gate Array)并行结构。针对LU 矩阵三角化分解在并行结构模板上的高性能实现及优化方法开展了研究。理论分析表明,该算法针对矩阵三角化计算过程具有更高的数据并行性与流水并行性;实验结果表明,与通用处理器的软件实现相比,根据该算法实现的矩阵三角化分解 FPGA 并行结果在关键计算性能上可以取得10倍以上的加速比。
可重構計算繫統成為加速計算密集型應用的重要選擇之一。在衆多受到關註的計算密集型問題中,矩陣三角化分解作為典型的基礎類應用始終處于研究的覈心地位,在求解線性方程組、求矩陣特徵值等科學與工程問題中有重要的研究價值。本文麵嚮矩陣三角化分解中共有的三角化計算過程,通過分析該過程的線性計算規律,提齣一種適于硬件併行實現的子矩陣更新同一化算法及矩陣三角化計算 FPGA (Field Programmable Gate Array)併行結構。針對LU 矩陣三角化分解在併行結構模闆上的高性能實現及優化方法開展瞭研究。理論分析錶明,該算法針對矩陣三角化計算過程具有更高的數據併行性與流水併行性;實驗結果錶明,與通用處理器的軟件實現相比,根據該算法實現的矩陣三角化分解 FPGA 併行結果在關鍵計算性能上可以取得10倍以上的加速比。
가중구계산계통성위가속계산밀집형응용적중요선택지일。재음다수도관주적계산밀집형문제중,구진삼각화분해작위전형적기출류응용시종처우연구적핵심지위,재구해선성방정조、구구진특정치등과학여공정문제중유중요적연구개치。본문면향구진삼각화분해중공유적삼각화계산과정,통과분석해과정적선성계산규률,제출일충괄우경건병행실현적자구진경신동일화산법급구진삼각화계산 FPGA (Field Programmable Gate Array)병행결구。침대LU 구진삼각화분해재병행결구모판상적고성능실현급우화방법개전료연구。이론분석표명,해산법침대구진삼각화계산과정구유경고적수거병행성여류수병행성;실험결과표명,여통용처리기적연건실현상비,근거해산법실현적구진삼각화분해 FPGA 병행결과재관건계산성능상가이취득10배이상적가속비。
The reconfigurable computing system became an important choice according to accelerating compute-intensive ap-plications.Among most compute-intensive applications,the matrix triangularization decomposition always was in the central position of research subjects and presented a great value to solve linear equation systems and matrix eigenvalue problems in science or engi-neering area.This paper analyzed the linear computing process of triangularization and proposed a hardware-adaptive parallel sub-matrix identity updating algorithm and a high-performance parallel structure hardware template for matrix triangularization on FPGA (Field Programmable Gate Array)according to the common triangularization computing process of the matrix triangularization de-composition.The research focused on the high-performance FPGA parallel structure implementation and optimization methods for the LU matrix triangularization decomposition.In theoretical analysis,the proposed algorithm presents better pipeline-parallelism and da-ta-parallelism during the matrix triangularization process.The experimental result shows that the proposed structure gets over decuple speedup compared to general-purpose processors and the previous works in vital performance.