现代电子技术
現代電子技術
현대전자기술
MODERN ELECTRONICS TECHNIQUE
2015年
16期
51-55
,共5页
MongoDB%MD5%大数据%档案文档去重%GridFs
MongoDB%MD5%大數據%檔案文檔去重%GridFs
MongoDB%MD5%대수거%당안문당거중%GridFs
MongoDB%MD5%big data%file document duplicate removal%GridFs
针对大数据下档案存储的现状,通过分析存储档案文档存在重复的原因,提出一种MongoDB存储档案文档的方法,利用MongoDB的GridFs统一处理不同类型和大小的文件,定义3个集合分别存储上传者记录、文件信息记录和分块文件内容,提出存储中通过文件MD5校验码值是否相同来进行去重研究,并实现去重的程序代码,有一定的实际意义.采用的分布式存储数据库增强了档案文档存储系统的可扩展性.实验表明,该方法能有效地去除重复的档案文档,提高查询效率.
針對大數據下檔案存儲的現狀,通過分析存儲檔案文檔存在重複的原因,提齣一種MongoDB存儲檔案文檔的方法,利用MongoDB的GridFs統一處理不同類型和大小的文件,定義3箇集閤分彆存儲上傳者記錄、文件信息記錄和分塊文件內容,提齣存儲中通過文件MD5校驗碼值是否相同來進行去重研究,併實現去重的程序代碼,有一定的實際意義.採用的分佈式存儲數據庫增彊瞭檔案文檔存儲繫統的可擴展性.實驗錶明,該方法能有效地去除重複的檔案文檔,提高查詢效率.
침대대수거하당안존저적현상,통과분석존저당안문당존재중복적원인,제출일충MongoDB존저당안문당적방법,이용MongoDB적GridFs통일처리불동류형화대소적문건,정의3개집합분별존저상전자기록、문건신식기록화분괴문건내용,제출존저중통과문건MD5교험마치시부상동래진행거중연구,병실현거중적정서대마,유일정적실제의의.채용적분포식존저수거고증강료당안문당존저계통적가확전성.실험표명,해방법능유효지거제중복적당안문당,제고사순효솔.
In allusion to the present situation in document storage in case of big data,the MongoDB method to save docu-ments is proposed according to the reason analysis of duplication in document storage. GridFs of MongoDB is used to store different type documents. Three different assemblages are definited to store the uploader record,document information record and content of blocked documents respectively. A research is proposed for removing the duplication by checking whether MD 5 check code is same or not. It is significant to realize program code for duplicated document removal. The distributive memory database was used to enhance the expandability of the document saving system. The experimental result shows that this method can remove the duplicated documents effectively and improve the efficiency of inquiry.