信息网络安全
信息網絡安全
신식망락안전
NETINFO SECURITY
2013年
8期
10-12
,共3页
重复数据检测%基于内容分块%基于极值点分块%指纹
重複數據檢測%基于內容分塊%基于極值點分塊%指紋
중복수거검측%기우내용분괴%기우겁치점분괴%지문
duplicated data detection%content deifned chunking%extremum deifned chunking%ifngerprint
重复数据检测技术能够大幅降低数据中心的存储量,节省网络带宽,减少建设和运维成本。为了克服基于内容分块(CDC)方法容易出现超长块的缺点,文章提出了基于极值点分块(EDC)的重复数据检测算法。EDC算法先计算出所有右边界在数据块上下限范围内的滑动窗口中数据的指纹,找出最后一个指纹极值,所对应的滑动窗口结束位置作为数据块的分界点,再计算该数据块的哈希值并判断是否重复块。实验结果表明,EDC算法的重复数据检测率、磁盘利用率分别是CDC算法的1.48倍和1.12倍,改进效果显著。
重複數據檢測技術能夠大幅降低數據中心的存儲量,節省網絡帶寬,減少建設和運維成本。為瞭剋服基于內容分塊(CDC)方法容易齣現超長塊的缺點,文章提齣瞭基于極值點分塊(EDC)的重複數據檢測算法。EDC算法先計算齣所有右邊界在數據塊上下限範圍內的滑動窗口中數據的指紋,找齣最後一箇指紋極值,所對應的滑動窗口結束位置作為數據塊的分界點,再計算該數據塊的哈希值併判斷是否重複塊。實驗結果錶明,EDC算法的重複數據檢測率、磁盤利用率分彆是CDC算法的1.48倍和1.12倍,改進效果顯著。
중복수거검측기술능구대폭강저수거중심적존저량,절성망락대관,감소건설화운유성본。위료극복기우내용분괴(CDC)방법용역출현초장괴적결점,문장제출료기우겁치점분괴(EDC)적중복수거검측산법。EDC산법선계산출소유우변계재수거괴상하한범위내적활동창구중수거적지문,조출최후일개지문겁치,소대응적활동창구결속위치작위수거괴적분계점,재계산해수거괴적합희치병판단시부중복괴。실험결과표명,EDC산법적중복수거검측솔、자반이용솔분별시CDC산법적1.48배화1.12배,개진효과현저。
The duplicate data detection technology can significantly reduce the duplication of data in data centers, save network bandwidth, decrease the cost of construction and maintenance. A duplicate data detection algorithm based on Extremum Defined Chunking(EDC) is proposed to overcome the long segment problem of Content Deifned Chunking(CDC) method. The EDC algorithm ifrst calculates all ifngerprints of the sliding windows that their boundary are within the upper and lower limits of data blocks. The last extremum of all ifngerprints is found out, the corresponding end position of the sliding window become the cut-off point of data block. Then the hash value of the data block is calculated to determine whether it is duplicate block. The experimental results show that ECD algorithm, duplicated data detection rate, disk utilization rate is respectively 1.48 times, 1.12 times of CDC algorithm, the effect is signiifcantly notable.