CAJ | 학술논문

句子相似度的计算在自然语言处理的各个领域占有很重要的地位，一些传统的计算方法只考虑句子的词形、句长、词序等表面信息，并没有考虑句子更深层次的语义信息，另一些考虑句子语义的方法在实用性上的表现不太理想。在空间向量模型的基础上提出了一种同时考虑句子结构和语义信息的关系向量模型，这种模型考虑了组成句子的关键词之间的搭配关系和关键词的同义信息，这些信息反应了句子的局部结构成分以及各局部之间的关联关系，因此更能体现句子的结构和语义信息。以关系向量模型为核心，提出了基于关系向量模型的句子相似度计算方法。同时将该算法应用到网络热点新闻自动生成算法中，排除文摘中意思相近的句子从而避免文摘的冗余。实验结果表明，在考虑网络新闻中的句子相似度时，与考虑词序与语义的算法相比，关系向量模型算法不但提高了句子相似度计算的准确率，计算的时间复杂度也得到了降低。
구자상사도적계산재자연어언처리적각개영역점유흔중요적지위，일사전통적계산방법지고필구자적사형、구장、사서등표면신식，병몰유고필구자경심층차적어의신식，령일사고필구자어의적방법재실용성상적표현불태이상。재공간향량모형적기출상제출료일충동시고필구자결구화어의신식적관계향량모형，저충모형고필료조성구자적관건사지간적탑배관계화관건사적동의신식，저사신식반응료구자적국부결구성분이급각국부지간적관련관계，인차경능체현구자적결구화어의신식。이관계향량모형위핵심，제출료기우관계향량모형적구자상사도계산방법。동시장해산법응용도망락열점신문자동생성산법중，배제문적중의사상근적구자종이피면문적적용여。실험결과표명，재고필망락신문중적구자상사도시，여고필사서여어의적산법상비，관계향량모형산법불단제고료구자상사도계산적준학솔，계산적시간복잡도야득도료강저。
Sentence similarity computation is very important in all fields of natural language process. Some of the tradi-tional algorithms only compare sentences based on their surface form such as same words, sentence length, word order and do not consider the sentence deep-level semantic information, some methods considered the sentence semantics get an unsatisfactory performance on the algorithm practicality. Therefore, a relation vector model which taking into account the relationship of sentence structure and semantic information based on space vector model is presented, this model is com-posed of a mix between the key words of the sentence and the key words synonymous information, which reflects local structural component of the sentence as well as the correlation between the local structure and therefore better reflects the structure and semantics of the sentence. An algorithm of sentence similarity based on relation vector model is put forward. The algorithm is applied to the network news summary generation algorithm in order to avoid redundancy. The experimental results show that, compared with the algorithm which considers the word order and semantic, relation vector model algo-rithm not only improves the accuracy of sentence similarity calculation, the time complexity of calculation is also reduced.