计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
7期
802-811
,共10页
燕彩蓉%张洋舜%徐光伟
燕綵蓉%張洋舜%徐光偉
연채용%장양순%서광위
实体解析%众包%MapReduce编程模型%隐私保护%患者主索引
實體解析%衆包%MapReduce編程模型%隱私保護%患者主索引
실체해석%음포%MapReduce편정모형%은사보호%환자주색인
entity resolution%crowdsourcing%MapReduce programming model%privacy protection%master patient index
实体解析是指发现并聚合描述现实世界中同一对象的记录。纯粹的机器算法虽然可以获得较高的效率,但是准确率难以保证。提出了一种机器计算与众包相结合的实体解析方法。该方法首先采用MapReduce并行计算框架排除不可能匹配的记录对,减少人类智能任务的数量,然后由人工进行确定性标注。为了支持隐私保护,在众包计算时提出了基于角色的访问控制模型和重要信息隐藏策略。该方法和模型被应用于某医院患者主索引构建平台,实验结果表明,人机结合方法充分利用了机器和人工处理的优势,可以进行高效率和高精度的患者实体解析,并且有效地避免了患者信息的泄漏。
實體解析是指髮現併聚閤描述現實世界中同一對象的記錄。純粹的機器算法雖然可以穫得較高的效率,但是準確率難以保證。提齣瞭一種機器計算與衆包相結閤的實體解析方法。該方法首先採用MapReduce併行計算框架排除不可能匹配的記錄對,減少人類智能任務的數量,然後由人工進行確定性標註。為瞭支持隱私保護,在衆包計算時提齣瞭基于角色的訪問控製模型和重要信息隱藏策略。該方法和模型被應用于某醫院患者主索引構建平檯,實驗結果錶明,人機結閤方法充分利用瞭機器和人工處理的優勢,可以進行高效率和高精度的患者實體解析,併且有效地避免瞭患者信息的洩漏。
실체해석시지발현병취합묘술현실세계중동일대상적기록。순수적궤기산법수연가이획득교고적효솔,단시준학솔난이보증。제출료일충궤기계산여음포상결합적실체해석방법。해방법수선채용MapReduce병행계산광가배제불가능필배적기록대,감소인류지능임무적수량,연후유인공진행학정성표주。위료지지은사보호,재음포계산시제출료기우각색적방문공제모형화중요신식은장책략。해방법화모형피응용우모의원환자주색인구건평태,실험결과표명,인궤결합방법충분이용료궤기화인공처리적우세,가이진행고효솔화고정도적환자실체해석,병차유효지피면료환자신식적설루。
Entity resolution is to find and cluster records that refer to the same real-world object. It can be an extremely difficult process to get high accuracy for computer algorithms alone although they can bring high efficiency. This paper proposes a hybrid approach combining machine processing with crowdsourcing for entity resolution. Firstly the record pairs that are impossible to match are excluded by MapReduce-based parallel computing framework so as to reduce the number of human intelligence tasks, and then those ambiguous record pairs are labeled by human oper-ation. A role-based access control model and related information hiding strategies are adopted for privacy protection during the crowdsourcing sessions. The approach and the model are applied on the master patient index building platform for a hospital. The experimental results show that they make full use of the advantages of machine-based and human-based processing ways, bring high efficiency and accuracy for patient entity resolution, and avoid the leakage of patient information.