计算机学报
計算機學報
계산궤학보
Chinese Journal of Computers
2015年
9期
1739-1754
,共16页
孙琛琛%申德荣%寇月%聂铁铮%于戈
孫琛琛%申德榮%寇月%聶鐵錚%于戈
손침침%신덕영%구월%섭철쟁%우과
联合式实体识别%相似度传递%基于结构的相似度%实体数据对象关系图
聯閤式實體識彆%相似度傳遞%基于結構的相似度%實體數據對象關繫圖
연합식실체식별%상사도전체%기우결구적상사도%실체수거대상관계도
joint entity resolution%similarity propagation%structure-based similarity%entity data object relationship graph
文中提出一种基于图的、迭代的联合式实体识别方法。初始时,将多类型的、关联的实体数据对象集合构建实体数据对象关系图,将基于语义路径的相似度和属性相似度结合起来判断数据对象是否匹配;然后,合并匹配成功的数据对象,并对对象图中的相应数据对象结点及其周边执行局部图收缩,这两个操作使对象图的局部语义变得更丰富,促使该局部范围内产生出新的候选匹配对象对,以待后续识别,实现相似度传递,形成一个迭代的识别过程。随着不断迭代,对象图的语义不断丰富,提高了联合式实体识别的准确性。通过实验证明文中提出的方法比已有的联合式实体识别方法和基于对象关系的单类型实体识别方法具有更高的准确性。
文中提齣一種基于圖的、迭代的聯閤式實體識彆方法。初始時,將多類型的、關聯的實體數據對象集閤構建實體數據對象關繫圖,將基于語義路徑的相似度和屬性相似度結閤起來判斷數據對象是否匹配;然後,閤併匹配成功的數據對象,併對對象圖中的相應數據對象結點及其週邊執行跼部圖收縮,這兩箇操作使對象圖的跼部語義變得更豐富,促使該跼部範圍內產生齣新的候選匹配對象對,以待後續識彆,實現相似度傳遞,形成一箇迭代的識彆過程。隨著不斷迭代,對象圖的語義不斷豐富,提高瞭聯閤式實體識彆的準確性。通過實驗證明文中提齣的方法比已有的聯閤式實體識彆方法和基于對象關繫的單類型實體識彆方法具有更高的準確性。
문중제출일충기우도적、질대적연합식실체식별방법。초시시,장다류형적、관련적실체수거대상집합구건실체수거대상관계도,장기우어의로경적상사도화속성상사도결합기래판단수거대상시부필배;연후,합병필배성공적수거대상,병대대상도중적상응수거대상결점급기주변집행국부도수축,저량개조작사대상도적국부어의변득경봉부,촉사해국부범위내산생출신적후선필배대상대,이대후속식별,실현상사도전체,형성일개질대적식별과정。수착불단질대,대상도적어의불단봉부,제고료연합식실체식별적준학성。통과실험증명문중제출적방법비이유적연합식실체식별방법화기우대상관계적단류형실체식별방법구유경고적준학성。
We propose a graph-based iterative joint entity resolution approach.To start off,an entity data object relationship graph is built from the input dataset consisting of multiple classes of related data objects.It hires a hybrid similarity,combining a structure similarity based on semantic paths and an attribute-based similarity,to decide whether two data objects match.Then it merges the matched pair and contracts the neighborhood of the merged pair,which leads to enrichment of semantics of the neighborhood.Enrichment of semantics may help generate some new candidate data object pairs in the neighborhood,which will be resolved later.Generation of new candidate data object pairs is called similarity propagation,making it an iterative process. With the iterative process going on,semantics of the object graph becomes richer and richer, promoting accuracy of entity resolution.The experimental evaluation proves that the proposed approach outperforms existing joint entity resolution approaches and relationship-based single class entity resolution approaches in accuracy.