计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
2期
198-203
,共6页
句子相似度%关系向量模型%句子语法%句子语义
句子相似度%關繫嚮量模型%句子語法%句子語義
구자상사도%관계향량모형%구자어법%구자어의
sentence similarity%relation vector model%sentence syntax%sentence semantics
句子相似度的计算在自然语言处理的各个领域占有很重要的地位,一些传统的计算方法只考虑句子的词形、句长、词序等表面信息,并没有考虑句子更深层次的语义信息,另一些考虑句子语义的方法在实用性上的表现不太理想。在空间向量模型的基础上提出了一种同时考虑句子结构和语义信息的关系向量模型,这种模型考虑了组成句子的关键词之间的搭配关系和关键词的同义信息,这些信息反应了句子的局部结构成分以及各局部之间的关联关系,因此更能体现句子的结构和语义信息。以关系向量模型为核心,提出了基于关系向量模型的句子相似度计算方法。同时将该算法应用到网络热点新闻自动生成算法中,排除文摘中意思相近的句子从而避免文摘的冗余。实验结果表明,在考虑网络新闻中的句子相似度时,与考虑词序与语义的算法相比,关系向量模型算法不但提高了句子相似度计算的准确率,计算的时间复杂度也得到了降低。
句子相似度的計算在自然語言處理的各箇領域佔有很重要的地位,一些傳統的計算方法隻攷慮句子的詞形、句長、詞序等錶麵信息,併沒有攷慮句子更深層次的語義信息,另一些攷慮句子語義的方法在實用性上的錶現不太理想。在空間嚮量模型的基礎上提齣瞭一種同時攷慮句子結構和語義信息的關繫嚮量模型,這種模型攷慮瞭組成句子的關鍵詞之間的搭配關繫和關鍵詞的同義信息,這些信息反應瞭句子的跼部結構成分以及各跼部之間的關聯關繫,因此更能體現句子的結構和語義信息。以關繫嚮量模型為覈心,提齣瞭基于關繫嚮量模型的句子相似度計算方法。同時將該算法應用到網絡熱點新聞自動生成算法中,排除文摘中意思相近的句子從而避免文摘的冗餘。實驗結果錶明,在攷慮網絡新聞中的句子相似度時,與攷慮詞序與語義的算法相比,關繫嚮量模型算法不但提高瞭句子相似度計算的準確率,計算的時間複雜度也得到瞭降低。
구자상사도적계산재자연어언처리적각개영역점유흔중요적지위,일사전통적계산방법지고필구자적사형、구장、사서등표면신식,병몰유고필구자경심층차적어의신식,령일사고필구자어의적방법재실용성상적표현불태이상。재공간향량모형적기출상제출료일충동시고필구자결구화어의신식적관계향량모형,저충모형고필료조성구자적관건사지간적탑배관계화관건사적동의신식,저사신식반응료구자적국부결구성분이급각국부지간적관련관계,인차경능체현구자적결구화어의신식。이관계향량모형위핵심,제출료기우관계향량모형적구자상사도계산방법。동시장해산법응용도망락열점신문자동생성산법중,배제문적중의사상근적구자종이피면문적적용여。실험결과표명,재고필망락신문중적구자상사도시,여고필사서여어의적산법상비,관계향량모형산법불단제고료구자상사도계산적준학솔,계산적시간복잡도야득도료강저。
Sentence similarity computation is very important in all fields of natural language process. Some of the tradi-tional algorithms only compare sentences based on their surface form such as same words, sentence length, word order and do not consider the sentence deep-level semantic information, some methods considered the sentence semantics get an unsatisfactory performance on the algorithm practicality. Therefore, a relation vector model which taking into account the relationship of sentence structure and semantic information based on space vector model is presented, this model is com-posed of a mix between the key words of the sentence and the key words synonymous information, which reflects local structural component of the sentence as well as the correlation between the local structure and therefore better reflects the structure and semantics of the sentence. An algorithm of sentence similarity based on relation vector model is put forward. The algorithm is applied to the network news summary generation algorithm in order to avoid redundancy. The experimental results show that, compared with the algorithm which considers the word order and semantic, relation vector model algo-rithm not only improves the accuracy of sentence similarity calculation, the time complexity of calculation is also reduced.