计算机学报
計算機學報
계산궤학보
Chinese Journal of Computers
2015年
9期
1825-1837
,共13页
大数据%HDFS%范德蒙码%分散式动态副本%优化存储
大數據%HDFS%範德矇碼%分散式動態副本%優化存儲
대수거%HDFS%범덕몽마%분산식동태부본%우화존저
big data%HDFS%Vandermonde code%decentralized dynamic replication%optimized storage
随着大数据时代的到来,新型文件系统 HDFS(Hadoop 分布式文件系统)的应用越来越广泛。但其本身也存在着整体存储成本过高、可扩展性低、节点负载均衡能力不足等问题。因此,该文提出了一种基于范德蒙码的HDFS 分散式动态副本存储优化策略,针对 HDFS 大多部署在大量的廉价硬件集群上的实际情况,在范德蒙码优化策略的基础上,采用分散式动态副本控制的思想对 HDFS 文件操作的计算过程、计算模式以及译码触发策略进行系统的改进,并通过校验码动态设置的方式将容错度控制在一个理想的范围之内,此外,结合伽罗华有限域理论对范德蒙码的编译码操作及计算方法进行全面优化,在不影响 HDFS 存储结构的前提下,降低了范德蒙码编译码的时间代价和计算的内存压力,节约了 HDFS 约30%的存储开销,数据可靠性提高了约200%,均衡 HDFS 系统节点负载能力,译码恢复效率平均提升约40%,形成了一套完整的、系统的优化方案,为未来 HDFS 的发展提供了一条有效途径。
隨著大數據時代的到來,新型文件繫統 HDFS(Hadoop 分佈式文件繫統)的應用越來越廣汎。但其本身也存在著整體存儲成本過高、可擴展性低、節點負載均衡能力不足等問題。因此,該文提齣瞭一種基于範德矇碼的HDFS 分散式動態副本存儲優化策略,針對 HDFS 大多部署在大量的廉價硬件集群上的實際情況,在範德矇碼優化策略的基礎上,採用分散式動態副本控製的思想對 HDFS 文件操作的計算過程、計算模式以及譯碼觸髮策略進行繫統的改進,併通過校驗碼動態設置的方式將容錯度控製在一箇理想的範圍之內,此外,結閤伽囉華有限域理論對範德矇碼的編譯碼操作及計算方法進行全麵優化,在不影響 HDFS 存儲結構的前提下,降低瞭範德矇碼編譯碼的時間代價和計算的內存壓力,節約瞭 HDFS 約30%的存儲開銷,數據可靠性提高瞭約200%,均衡 HDFS 繫統節點負載能力,譯碼恢複效率平均提升約40%,形成瞭一套完整的、繫統的優化方案,為未來 HDFS 的髮展提供瞭一條有效途徑。
수착대수거시대적도래,신형문건계통 HDFS(Hadoop 분포식문건계통)적응용월래월엄범。단기본신야존재착정체존저성본과고、가확전성저、절점부재균형능력불족등문제。인차,해문제출료일충기우범덕몽마적HDFS 분산식동태부본존저우화책략,침대 HDFS 대다부서재대량적렴개경건집군상적실제정황,재범덕몽마우화책략적기출상,채용분산식동태부본공제적사상대 HDFS 문건조작적계산과정、계산모식이급역마촉발책략진행계통적개진,병통과교험마동태설치적방식장용착도공제재일개이상적범위지내,차외,결합가라화유한역이론대범덕몽마적편역마조작급계산방법진행전면우화,재불영향 HDFS 존저결구적전제하,강저료범덕몽마편역마적시간대개화계산적내존압력,절약료 HDFS 약30%적존저개소,수거가고성제고료약200%,균형 HDFS 계통절점부재능력,역마회복효솔평균제승약40%,형성료일투완정적、계통적우화방안,위미래 HDFS 적발전제공료일조유효도경。
With the arrival of the era of big data,the application of the new file management architecture HDFS (Hadoop Distributed File System)is more and more widely.But it also having many problems like the overall storage costs too much,the extensibility is low,the nodes load balance ability is insufficient and so on.So this paper proposes an Optimized Storage Strategy of HDFS Based on Vandermonde Code,according to the actual situation,which the HDFS are deployed in a large number of inexpensive hardware clusters,it uses the thought of decentralized dynamic replication control to optimize the calculation process,calculation mode and decoding trigger strategy of HDFS file operations comprehensively based on Vandermonde Code optimiza-tion strategy,and uses the dynamic setting check code to control the fault tolerant in a desirable range.Besides,it uses Galois finite field theory to optimize the encoding and decoding operation of Vandermonde Code and calculation method comprehensively.Under the premise of without affecting storage structure of HDFS,it reduces time cost and the calculation memory pressure of Vandermonde Code,reduces about 30% of the storage cost,increases about 200% of the reliability of HDFS,balances the load of system,increases about 40% of the decoding recovery efficiency, formed a set of complete and systematic optimization solution, providesan effective way fordevelopment of HDFS in the future.