计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2010年
5期
21-23,58
,共4页
文本检索%特征融合%频率矩阵%奇异值分解
文本檢索%特徵融閤%頻率矩陣%奇異值分解
문본검색%특정융합%빈솔구진%기이치분해
texts retrieval%feature fusion%frequency matrix%Singular Value Decomposition(SVD)
提出一种新的英文文本检索算法,该算法将英文文本映射为26阶频率矩阵,然后通过奇异值分解,对文本表示空间进行降维处理,并融合第一奇异值分量和第二奇异值分量的特征,得到既反映字母统计频率,又反映文本字符间顺序结构的复特征向量,最后利用向量间余弦相似度作为文本检索的相似度度量.数据对比表明,算法取得了较好的实验效果,且在检索准确率和运算效率上优于经典的LSA算法.
提齣一種新的英文文本檢索算法,該算法將英文文本映射為26階頻率矩陣,然後通過奇異值分解,對文本錶示空間進行降維處理,併融閤第一奇異值分量和第二奇異值分量的特徵,得到既反映字母統計頻率,又反映文本字符間順序結構的複特徵嚮量,最後利用嚮量間餘絃相似度作為文本檢索的相似度度量.數據對比錶明,算法取得瞭較好的實驗效果,且在檢索準確率和運算效率上優于經典的LSA算法.
제출일충신적영문문본검색산법,해산법장영문문본영사위26계빈솔구진,연후통과기이치분해,대문본표시공간진행강유처리,병융합제일기이치분량화제이기이치분량적특정,득도기반영자모통계빈솔,우반영문본자부간순서결구적복특정향량,최후이용향량간여현상사도작위문본검색적상사도도량.수거대비표명,산법취득료교호적실험효과,차재검색준학솔화운산효솔상우우경전적LSA산법.
In this paper,a new retrieval algorithm for English texts is proposed.First of all,the English texts are mapped into frequency matrixes of order 26 and the dimensions of texts representation space are reduced through singular value decomposition.Second,it fuses the features of the first singular value component and the second one,and then gets the complex feature vectors which reflect not only the statistic frequency but also the sequential structure of letters.In the end,the cosine similarity of texts is used to measure the similarity between the query and documents.The data comparison indicates that this algorithm has well experimental results.Moreover,it gets the advantage over the classic LSA retrieval algorithm in precision and operational efficiency.