计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2009年
21期
85-87
,共3页
元数据%重复记录检测%N-Gram方法%相似度
元數據%重複記錄檢測%N-Gram方法%相似度
원수거%중복기록검측%N-Gram방법%상사도
metadata%duplicate record detection%N-Gram method%similarity
对联邦数字图书馆中重复元数据记录进行检测和管理,是保证元数据质量、提高联邦检索服务质量的关键.针对现有联邦数字图书馆中重复记录检测方法计算集中、准确度不高等缺点,提出一种快速高效的相似重复元数据记录检测方法,该方法基于改进的N-Gram方法,适合较大规模联邦数字图书馆.模拟实验结果表明,该方法能有效提高重复检测的性能,加快重复检测的速度.
對聯邦數字圖書館中重複元數據記錄進行檢測和管理,是保證元數據質量、提高聯邦檢索服務質量的關鍵.針對現有聯邦數字圖書館中重複記錄檢測方法計算集中、準確度不高等缺點,提齣一種快速高效的相似重複元數據記錄檢測方法,該方法基于改進的N-Gram方法,適閤較大規模聯邦數字圖書館.模擬實驗結果錶明,該方法能有效提高重複檢測的性能,加快重複檢測的速度.
대련방수자도서관중중복원수거기록진행검측화관리,시보증원수거질량、제고련방검색복무질량적관건.침대현유련방수자도서관중중복기록검측방법계산집중、준학도불고등결점,제출일충쾌속고효적상사중복원수거기록검측방법,해방법기우개진적N-Gram방법,괄합교대규모련방수자도서관.모의실험결과표명,해방법능유효제고중복검측적성능,가쾌중복검측적속도.
Metadata records duplicate detection and management of federated digital library are one of key issues to ensure metadata quality and improve federal retrieval services. Many duplicate record detection methods exist for conventional federated digital library, but they are computationally intensive and low accuracy and so on. This paper proposes an efficient duplication approach for a relatively large federated digital library based on improved N-Gram method. Simulation experimental results show that the method improve the performance of duplicate detection effectively, accelerate the rate of duplicate detection.