西北工业大学学报
西北工業大學學報
서북공업대학학보
JOURNAL OF NORTHWESTERN POLYTECHNICAL UNIVERSITY
2014年
4期
658-663
,共6页
王丽芳%张志珂%蒋泽军%蔡小斌%彭成章
王麗芳%張誌珂%蔣澤軍%蔡小斌%彭成章
왕려방%장지가%장택군%채소빈%팽성장
重复数据删除集群%无状态数据路由算法%文件路径%存储使用量
重複數據刪除集群%無狀態數據路由算法%文件路徑%存儲使用量
중복수거산제집군%무상태수거로유산법%문건로경%존저사용량
calculations%cluster computing%data tansfer%design%efficiency%experiments%hard disk storage%routing algorithms%simulators%software architecture%deduplication cluster%directory%stateless data routing algorithm%storage utilization
重复数据删除集群是解决不断增长的海量数据备份需求的一种有效方法。它的关键问题是数据路由策略,即如何把数据合理分配到集群内的各个节点。目前的数据路由策略利用文件或者数据段的最小数据块签名计算路由目标节点,称作MCS( minimum chunk signature)数据路由策略。当重复数据删除集群规模较小时,这种方法的存储使用量接近单节点重复数据删除。但是,当集群规模较大时,它的存储使用量远远劣于单节点重复数据删除。为了降低重复数据删除集群的存储使用量,提出一种基于路径的重复数据删除集群的数据路由策略,称作DRSD( data routing strategy based on di-rectories)。实验结果表明,对于各种不同的节点数量,DRSD的重复数据删除率都明显高于MCS,并且接近单节点重复数据删除。当节点数量是64时,DRSD的重复数据删除率比MCS高35%。
重複數據刪除集群是解決不斷增長的海量數據備份需求的一種有效方法。它的關鍵問題是數據路由策略,即如何把數據閤理分配到集群內的各箇節點。目前的數據路由策略利用文件或者數據段的最小數據塊籤名計算路由目標節點,稱作MCS( minimum chunk signature)數據路由策略。噹重複數據刪除集群規模較小時,這種方法的存儲使用量接近單節點重複數據刪除。但是,噹集群規模較大時,它的存儲使用量遠遠劣于單節點重複數據刪除。為瞭降低重複數據刪除集群的存儲使用量,提齣一種基于路徑的重複數據刪除集群的數據路由策略,稱作DRSD( data routing strategy based on di-rectories)。實驗結果錶明,對于各種不同的節點數量,DRSD的重複數據刪除率都明顯高于MCS,併且接近單節點重複數據刪除。噹節點數量是64時,DRSD的重複數據刪除率比MCS高35%。
중복수거산제집군시해결불단증장적해량수거비빈수구적일충유효방법。타적관건문제시수거로유책략,즉여하파수거합리분배도집군내적각개절점。목전적수거로유책략이용문건혹자수거단적최소수거괴첨명계산로유목표절점,칭작MCS( minimum chunk signature)수거로유책략。당중복수거산제집군규모교소시,저충방법적존저사용량접근단절점중복수거산제。단시,당집군규모교대시,타적존저사용량원원렬우단절점중복수거산제。위료강저중복수거산제집군적존저사용량,제출일충기우로경적중복수거산제집군적수거로유책략,칭작DRSD( data routing strategy based on di-rectories)。실험결과표명,대우각충불동적절점수량,DRSD적중복수거산제솔도명현고우MCS,병차접근단절점중복수거산제。당절점수량시64시,DRSD적중복수거산제솔비MCS고35%。
Deduplication cluster is an effective way for meeting the increasing and massive data backup require-ments. Its key problem is how to distribute the data to nodes in the deduplication cluster; this is the data routing strategy. Existing data routing strategy utilizes the MCS ( Minimum Chunk Signature) of a file or data segment to compute the target routing node. When the size of the deduplication cluster is small, the storage utilization of MCS approaches the single node deduplication. However, when the deduplication cluster is in large scale, its storage uti-lization is much lower than the single node deduplication. We propose a novel data routing strategy using directories for the deduplication cluster for decreasing the storage utilization of the deduplication cluster,;this new strategy we call DRSD( Data Routing Strategy Based on Directories) . Experimental results and their analysis show preliminarily that, for various numbers of the nodes of the deduplication cluster, the deduplication ratios obtained with DRSD are much better than those obtained with MCS, and even approach those obtained with single node deduplication. When the number of nodes is 64, the deduplication ratio obtained with DRSD is 35% better than that obtained with MCS.