软件学报
軟件學報
연건학보
JOURNAL OF SOFTWARE
2014年
9期
2076-2087
,共12页
怀宝兴%宝腾飞%祝恒书%刘淇
懷寶興%寶騰飛%祝恆書%劉淇
부보흥%보등비%축항서%류기
命名实体链接%概率主题模型%维基百科
命名實體鏈接%概率主題模型%維基百科
명명실체련접%개솔주제모형%유기백과
named entity linking%probabilistic topic models%Wikipedia
命名实体链接(named entity linking,简称NEL)是把文档中给定的命名实体链接到知识库中一个无歧义实体的过程,包括同义实体的合并、歧义实体的消歧等。该技术可以提升在线推荐系统、互联网搜索引擎等实际应用的信息过滤能力。然而,实体数量的激增给实体消歧等带来了巨大挑战,使得当前的命名实体链接技术越来越难以满足人们对链接准确率的要求。考虑到文档中的词和实体往往具有不同的语义主题(如“苹果”既能表示水果又可以是某电子品牌),而同一文档中的词与实体应当具有相似的主题,因此提出在语义层面对文档进行建模和实体消歧的思想。基于此设计一种完整的、基于概率主题模型的命名实体链接方法。首先,利用维基百科(Wikipedia)构建知识库;然后,利用概率主题模型将词和命名实体映射到同一个主题空间,并根据实体在主题空间中的位置向量,把给定文本中的命名实体链接到知识库中一个无歧义的命名实体;最后,在真实的数据集上进行大量实验,并与标准方法进行对比。实验结果表明:所提出的框架能够较好地解决了实体歧义问题,取得了更高的实体链接准确度。
命名實體鏈接(named entity linking,簡稱NEL)是把文檔中給定的命名實體鏈接到知識庫中一箇無歧義實體的過程,包括同義實體的閤併、歧義實體的消歧等。該技術可以提升在線推薦繫統、互聯網搜索引擎等實際應用的信息過濾能力。然而,實體數量的激增給實體消歧等帶來瞭巨大挑戰,使得噹前的命名實體鏈接技術越來越難以滿足人們對鏈接準確率的要求。攷慮到文檔中的詞和實體往往具有不同的語義主題(如“蘋果”既能錶示水果又可以是某電子品牌),而同一文檔中的詞與實體應噹具有相似的主題,因此提齣在語義層麵對文檔進行建模和實體消歧的思想。基于此設計一種完整的、基于概率主題模型的命名實體鏈接方法。首先,利用維基百科(Wikipedia)構建知識庫;然後,利用概率主題模型將詞和命名實體映射到同一箇主題空間,併根據實體在主題空間中的位置嚮量,把給定文本中的命名實體鏈接到知識庫中一箇無歧義的命名實體;最後,在真實的數據集上進行大量實驗,併與標準方法進行對比。實驗結果錶明:所提齣的框架能夠較好地解決瞭實體歧義問題,取得瞭更高的實體鏈接準確度。
명명실체련접(named entity linking,간칭NEL)시파문당중급정적명명실체련접도지식고중일개무기의실체적과정,포괄동의실체적합병、기의실체적소기등。해기술가이제승재선추천계통、호련망수색인경등실제응용적신식과려능력。연이,실체수량적격증급실체소기등대래료거대도전,사득당전적명명실체련접기술월래월난이만족인문대련접준학솔적요구。고필도문당중적사화실체왕왕구유불동적어의주제(여“평과”기능표시수과우가이시모전자품패),이동일문당중적사여실체응당구유상사적주제,인차제출재어의층면대문당진행건모화실체소기적사상。기우차설계일충완정적、기우개솔주제모형적명명실체련접방법。수선,이용유기백과(Wikipedia)구건지식고;연후,이용개솔주제모형장사화명명실체영사도동일개주제공간,병근거실체재주제공간중적위치향량,파급정문본중적명명실체련접도지식고중일개무기의적명명실체;최후,재진실적수거집상진행대량실험,병여표준방법진행대비。실험결과표명:소제출적광가능구교호지해결료실체기의문제,취득료경고적실체련접준학도。
Named entity linking (NEL) is an advanced technology which links a given named entity to an unambiguous entity in the knowledge base, and thus plays an important role in a wide range of Internet services, such as online recommender systems and Web search engines. However, with the explosive increasing of online information and applications, traditional solutions of NEL are facing more and more challenges towards linking accuracy due to the large number of online entities. Moreover, the entities are usually associated with different semantic topics (e.g., the entity“Apple”could be either a fruit or a brand) whereas the latent topic distributions of words and entities in same documents should be similar. To address this issue, this paper proposes a novel topic modeling approach to named entity linking. Different from existing works, the new approach provides a comprehensive framework for NEL and can uncover the semantic relationship between documents and named entities. Specifically, it first builds a knowledge base of unambiguous entities with the help of Wikipedia. Then, it proposes a novel bipartite topic model to capture the latent topic distribution between entities and documents. Therefore, given a new named entity, the new approach can link it to the unambiguous entity in the knowledge base by calculating their semantic similarity with respect to latent topics. Finally, the paper conducts extensive experiments on a real-world data set to evaluate our approach for named entity linking. Experimental results clearly show that the proposed approach outperforms other state-of-the-art baselines with a significant margin.