电子学报
電子學報
전자학보
Acta Electronica Sinica
2015年
11期
2151-2160
,共10页
可重构单元阵列%时域映射%累加概率权值%异步计算时延%资源约束
可重構單元陣列%時域映射%纍加概率權值%異步計算時延%資源約束
가중구단원진렬%시역영사%루가개솔권치%이보계산시연%자원약속
reconfigurable cell array%temporal mapping%accumulation probability weight%asynchronous computation delay%resource constraint
针对多约束下的行流水粗粒度可重构体系结构的硬件任务划分映射问题,提出了一种多目标优化映射算法.该算法根据运算节点执行时延、依赖度等因素构造了累加概率权值函数,在满足可重构单元面积和互连等约束下,通过该函数值动态调整就绪节点的映射调度次序,当一块可重构单元阵列当前行映射完毕后,就自动换行,当一块阵列被填满,就切换到下一块,当一个数据流图映射完毕后,就自动计算划分块数等参数.实验结果表明,与层贪婪映射算法相比,文中算法平均执行总周期降低了8.4%(RCA4×4)和5.3%(RCA6×6),与分裂压缩内核映射算法相比,文中算法平均执行总周期降低了20.6%(RCA4×4)和21.0%(RCA6×6),从而验证了文中提出算法的有效性.
針對多約束下的行流水粗粒度可重構體繫結構的硬件任務劃分映射問題,提齣瞭一種多目標優化映射算法.該算法根據運算節點執行時延、依賴度等因素構造瞭纍加概率權值函數,在滿足可重構單元麵積和互連等約束下,通過該函數值動態調整就緒節點的映射調度次序,噹一塊可重構單元陣列噹前行映射完畢後,就自動換行,噹一塊陣列被填滿,就切換到下一塊,噹一箇數據流圖映射完畢後,就自動計算劃分塊數等參數.實驗結果錶明,與層貪婪映射算法相比,文中算法平均執行總週期降低瞭8.4%(RCA4×4)和5.3%(RCA6×6),與分裂壓縮內覈映射算法相比,文中算法平均執行總週期降低瞭20.6%(RCA4×4)和21.0%(RCA6×6),從而驗證瞭文中提齣算法的有效性.
침대다약속하적행류수조립도가중구체계결구적경건임무화분영사문제,제출료일충다목표우화영사산법.해산법근거운산절점집행시연、의뢰도등인소구조료루가개솔권치함수,재만족가중구단원면적화호련등약속하,통과해함수치동태조정취서절점적영사조도차서,당일괴가중구단원진렬당전행영사완필후,취자동환행,당일괴진렬피전만,취절환도하일괴,당일개수거류도영사완필후,취자동계산화분괴수등삼수.실험결과표명,여층탐람영사산법상비,문중산법평균집행총주기강저료8.4%(RCA4×4)화5.3%(RCA6×6),여분렬압축내핵영사산법상비,문중산법평균집행총주기강저료20.6%(RCA4×4)화21.0%(RCA6×6),종이험증료문중제출산법적유효성.
Based on row pipelining coarse grained reconfigurable architecture (CGRA),we presented MOM(multi-objective optimization mapping)algorithm to solve the multi-constraints hardware task partitioning-mapping problem.The cumulative proba-bility weight function was constructed by the execution delay of computing nodes and the dependence between two nodes,etc.With the constraints of reconfigurable cell area and interconnection,the proposed algorithm could adjust dynamically the scheduling order of the ready nodes by thefunction values.When a row of the RCA was mapped completely,MOM began on a new row.When the RCA was filled,MOM switched to the next one.When a DFG (data flow graph)was mapped completely,the number of modules and etc were calculated automatically in MOM.Experiment results show that the average execution total cycles of MOM decrease by 8.4%(RCA4 ×4)and 5.3%(RCA6 ×6)comparing with LBGM (level based greedy mapping)algorithm.Comparing with SPKM (split-push kernel mapping)algorithm,the average execution total cycles of MOM decrease by 20.6%(RCA4 ×4 )and 21%(RCA6 ×6).Experimental evaluation confirms the efficiency of our approach.