智能系统学报
智能繫統學報
지능계통학보
CAAI TRANSACTIONS ON INTELLIGENT SYSTEMS
2015年
2期
261-266
,共6页
语义%稀疏%上下文背景%知识库%概念簇%多主题提取%K-means%MEABCC
語義%稀疏%上下文揹景%知識庫%概唸簇%多主題提取%K-means%MEABCC
어의%희소%상하문배경%지식고%개념족%다주제제취%K-means%MEABCC
semantic%sparsity%context%knowledge base%concept clusters%multi-topic extraction%K-means%ME-ABCC
现实世界存在着大量的多主题文本,多主题在信息检索、图书情报等领域有着广泛的应用。传统主题提取算法大多是针对文本整体提取一个主题,且存在缺乏语义信息、向量高维和稀疏等缺陷。以《知网》为知识库,构建概念向量表示文本,根据概念的语义及上下文背景对同义词进行归并、对多义词进行排歧,并利用概念间语义关系实现语义相似度计算;在此基础上提出基于概念簇的多主题提取算法MEABCC,该算法通过对概念进行聚类,得到多个主题簇;在使用K?means算法进行概念聚类时,通过“预设种子”方法对其进行改进,以弥补传统K?means算法对初始中心的敏感性所引起的时空开销不稳定、结果波动较大的缺陷。实验结果表明,该算法具有较好的准确率、召回率和F1值。
現實世界存在著大量的多主題文本,多主題在信息檢索、圖書情報等領域有著廣汎的應用。傳統主題提取算法大多是針對文本整體提取一箇主題,且存在缺乏語義信息、嚮量高維和稀疏等缺陷。以《知網》為知識庫,構建概唸嚮量錶示文本,根據概唸的語義及上下文揹景對同義詞進行歸併、對多義詞進行排歧,併利用概唸間語義關繫實現語義相似度計算;在此基礎上提齣基于概唸簇的多主題提取算法MEABCC,該算法通過對概唸進行聚類,得到多箇主題簇;在使用K?means算法進行概唸聚類時,通過“預設種子”方法對其進行改進,以瀰補傳統K?means算法對初始中心的敏感性所引起的時空開銷不穩定、結果波動較大的缺陷。實驗結果錶明,該算法具有較好的準確率、召迴率和F1值。
현실세계존재착대량적다주제문본,다주제재신식검색、도서정보등영역유착엄범적응용。전통주제제취산법대다시침대문본정체제취일개주제,차존재결핍어의신식、향량고유화희소등결함。이《지망》위지식고,구건개념향량표시문본,근거개념적어의급상하문배경대동의사진행귀병、대다의사진행배기,병이용개념간어의관계실현어의상사도계산;재차기출상제출기우개념족적다주제제취산법MEABCC,해산법통과대개념진행취류,득도다개주제족;재사용K?means산법진행개념취류시,통과“예설충자”방법대기진행개진,이미보전통K?means산법대초시중심적민감성소인기적시공개소불은정、결과파동교대적결함。실험결과표명,해산법구유교호적준학솔、소회솔화F1치。
There are a large number of multi?topic documents existing in the real world, and the extraction of multi?topic is widely used in the fields of information retrieval, library science and intelligence. In the traditional theme extraction algorithm, in most cases a theme is extracted for the whole text, which lacks of semantic information and has high?dimensional vector and sparse defects. Setting concept vectors to represent text based on the repository of cnki.net, merging synonyms and discriminating polysemy according to the semantic of concepts and context, there?by achieving the computation of semantic similarity in light of the semantic relation among concepts. The multi?topic extraction algorithm based on the concept of clusters ( MEABCC) is proposed. The MEABCC acquires multiple top?ics by clustering concepts. The conceptual clustering made by K?means algorithm is improved through the method of presetting "default seed", which makes up the undulating time and space overlay and the unstable results. This happen to be caused by sensitivity to initial centers of traditional K?means algorithm. The experiments showed that MEABCC has good accuracy, recall and F1values.