系统工程理论与实践
繫統工程理論與實踐
계통공정이론여실천
Systems Engineering—Theory & Practice
2015年
4期
997~1004
,共null页
数据清理 实体解析 异质网络 联合聚类
數據清理 實體解析 異質網絡 聯閤聚類
수거청리 실체해석 이질망락 연합취류
data cleaning; entity resolution; heterogeneous networks; co-clustering
实体解析问题是数据挖掘数据清理过程中的基本问题.异质网络数据的大量涌现,要求能够针对包含多种类型对象的数据同时进行实体解析.针对包含两种对象的实体解析问题,提出了一种基于联合聚类思想的协同实体解析算法.将两种对象分为决定对象和辅助对象,提出了一个基于联合聚类思想的两阶段协同实体解析框架,能够同时获得决定对象和辅助对象的各自聚类结果,其中每一个类包含的若干实体参考表示是对现实世界中同一实体的共同引用.最后对提出的算法进行了数值实验.
實體解析問題是數據挖掘數據清理過程中的基本問題.異質網絡數據的大量湧現,要求能夠針對包含多種類型對象的數據同時進行實體解析.針對包含兩種對象的實體解析問題,提齣瞭一種基于聯閤聚類思想的協同實體解析算法.將兩種對象分為決定對象和輔助對象,提齣瞭一箇基于聯閤聚類思想的兩階段協同實體解析框架,能夠同時穫得決定對象和輔助對象的各自聚類結果,其中每一箇類包含的若榦實體參攷錶示是對現實世界中同一實體的共同引用.最後對提齣的算法進行瞭數值實驗.
실체해석문제시수거알굴수거청리과정중적기본문제.이질망락수거적대량용현,요구능구침대포함다충류형대상적수거동시진행실체해석.침대포함량충대상적실체해석문제,제출료일충기우연합취류사상적협동실체해석산법.장량충대상분위결정대상화보조대상,제출료일개기우연합취류사상적량계단협동실체해석광가,능구동시획득결정대상화보조대상적각자취류결과,기중매일개류포함적약간실체삼고표시시대현실세계중동일실체적공동인용.최후대제출적산법진행료수치실험.
Entity resolution is a fundamental issue in the process of data cleaning in data mining. With the advent of heterogeneous networks that consist of multi-typed objects, it is necessary to identify entities of different types simultaneously. In this paper, we propose a co-clustering algorithm for collective relational entity resolution in data set containing two types of entities. Specifically, we first classify objects into dominant objects and assistant objects according to their roles in entity resolution. When, a two-phase framework for entity resolution of two object types based on co-clustering is presented, which obtains the object clustering results of each type simultaneously and considers references assigned to the same cluster as the same entity. A numerical example is also presented to illustrate the proposed new algorithm.