计算机辅助设计与图形学学报
計算機輔助設計與圖形學學報
계산궤보조설계여도형학학보
JOURNAL OF COMPUTER-AIDED DESIGN & COMPUTER GRAPHICS
2014年
11期
2079-2090
,共12页
吴子旭%付方发%路禹%王进祥
吳子旭%付方髮%路禹%王進祥
오자욱%부방발%로우%왕진상
众核%容错%拓扑重配置%消息传递接口
衆覈%容錯%拓撲重配置%消息傳遞接口
음핵%용착%탁복중배치%소식전체접구
manycore%fault tolerance%topology reconfiguration%message passing interface
系统故障恢复时间是众核系统容错的一项重要指标。为加快系统故障恢复,在基于消息传递模型的众核系统中提出一种快速的拓扑重配置容错方法。首先根据物理拓扑故障情况为每个核心定义映射区域,利用匈牙利算法快速构建初始解;然后通过限制交错映射的发生,采用禁忌搜索在初始解的基础上快速优化,获得最终重配置映射解;最后根据重配置映射解更新各运算节点上的节点映射关系表完成拓扑重配置,实现众核系统的核级容错。实验结果表明,该方法能够快速找到优化的拓扑重配置方案并成功地完成系统恢复,具有较低的容错时间开销。
繫統故障恢複時間是衆覈繫統容錯的一項重要指標。為加快繫統故障恢複,在基于消息傳遞模型的衆覈繫統中提齣一種快速的拓撲重配置容錯方法。首先根據物理拓撲故障情況為每箇覈心定義映射區域,利用匈牙利算法快速構建初始解;然後通過限製交錯映射的髮生,採用禁忌搜索在初始解的基礎上快速優化,穫得最終重配置映射解;最後根據重配置映射解更新各運算節點上的節點映射關繫錶完成拓撲重配置,實現衆覈繫統的覈級容錯。實驗結果錶明,該方法能夠快速找到優化的拓撲重配置方案併成功地完成繫統恢複,具有較低的容錯時間開銷。
계통고장회복시간시음핵계통용착적일항중요지표。위가쾌계통고장회복,재기우소식전체모형적음핵계통중제출일충쾌속적탁복중배치용착방법。수선근거물리탁복고장정황위매개핵심정의영사구역,이용흉아리산법쾌속구건초시해;연후통과한제교착영사적발생,채용금기수색재초시해적기출상쾌속우화,획득최종중배치영사해;최후근거중배치영사해경신각운산절점상적절점영사관계표완성탁복중배치,실현음핵계통적핵급용착。실험결과표명,해방법능구쾌속조도우화적탁복중배치방안병성공지완성계통회복,구유교저적용착시간개소。
System fault‐recovery time is a key objective for fault tolerance in manycore systems .To accelerate system recovery from faults ,a fast topology reconfiguration strategy is proposed for fault tolerance in message passing model based manycore systems .Firstly ,a mapping domain is defined for each core according to the fault condition of the physical topology and Hungarian algorithm is adopted for fast generation of the initial solution .Secondly ,by restricting twisted mappings ,Tabu search is employed to perform a fast optimization based on the initial solution and obtain the final reconfiguration mapping solution .Finally ,by updating the mapping table on each computational node according to the reconfiguration mapping solution and completing the topology reconfiguration , the core‐level fault tolerance of a manycore system is realized .The experimental results show that ,the proposed strategy is capable of finding an optimal topology reconfiguration solution rapidly and recovering the system successfully w hile maintaining low time overhead for fault tolerance .