计算机技术与发展
計算機技術與髮展
계산궤기술여발전
COMPUTER TECHNOLOGY AND DEVELOPMENT
2015年
1期
6-10
,共5页
检索模型%向量空间模型%本体%相似度
檢索模型%嚮量空間模型%本體%相似度
검색모형%향량공간모형%본체%상사도
retrieval model%vector space model%ontology%similarity
向量空间模型是最常用的信息检索模型,它根据词频来计算文档之间的相关度,这种方法虽然能够满足用户的基本检索需求,但是对于检索要求较高的用户,其效果仍然不甚理想。文中在向量空间模型的基础上,首先通过领域本体和上层本体来计算特征词项之间的相似度,据此得出与查询词相关的词,在求词项频率和逆文档频率时考虑这些词,然后引入了词序相关度和词语相邻相关度这两个概念,把特征项的位置关系也考虑进来。实验结果表明,文中提出的模型相比原始向量空间模型,在准确率上有了较大的改善。这完全说明,与原始向量空间模型相比,文中提出的检索模型不仅考虑了与原有词项具有相似语义的词项,而且还考虑了词项顺序和词项相邻信息,从而更能符合用户的检索要求。
嚮量空間模型是最常用的信息檢索模型,它根據詞頻來計算文檔之間的相關度,這種方法雖然能夠滿足用戶的基本檢索需求,但是對于檢索要求較高的用戶,其效果仍然不甚理想。文中在嚮量空間模型的基礎上,首先通過領域本體和上層本體來計算特徵詞項之間的相似度,據此得齣與查詢詞相關的詞,在求詞項頻率和逆文檔頻率時攷慮這些詞,然後引入瞭詞序相關度和詞語相鄰相關度這兩箇概唸,把特徵項的位置關繫也攷慮進來。實驗結果錶明,文中提齣的模型相比原始嚮量空間模型,在準確率上有瞭較大的改善。這完全說明,與原始嚮量空間模型相比,文中提齣的檢索模型不僅攷慮瞭與原有詞項具有相似語義的詞項,而且還攷慮瞭詞項順序和詞項相鄰信息,從而更能符閤用戶的檢索要求。
향량공간모형시최상용적신식검색모형,타근거사빈래계산문당지간적상관도,저충방법수연능구만족용호적기본검색수구,단시대우검색요구교고적용호,기효과잉연불심이상。문중재향량공간모형적기출상,수선통과영역본체화상층본체래계산특정사항지간적상사도,거차득출여사순사상관적사,재구사항빈솔화역문당빈솔시고필저사사,연후인입료사서상관도화사어상린상관도저량개개념,파특정항적위치관계야고필진래。실험결과표명,문중제출적모형상비원시향량공간모형,재준학솔상유료교대적개선。저완전설명,여원시향량공간모형상비,문중제출적검색모형불부고필료여원유사항구유상사어의적사항,이차환고필료사항순서화사항상린신식,종이경능부합용호적검색요구。
Vector space model,which calculates the relatedness between documents through word frequency,is a frequently used informa-tion retrieval model. This method can meet the user's basic retrieval requirements,but for users with higher requirements,its effect is still not very ideal. In this paper,based on vector space model,first calculate the similarity,which can produce words related to the query word,between words through the use of domain ontology and upper ontology. So can take advantage of the related word when calculate TF and IDF. Then by introducing the concept of word order relatedness and word adjacent relatedness,can embody the position relation-ship. The experimental results show that this method can improve the precision considerably. This fully shows that,compared with the o-riginal vector space model, the retrieval model proposed not only considers the terms which have similar semantics with the original words,but also thinks about the word order information and word adjacent information,thus can meet users' retrieval requirements better.