东南大学学报(自然科学版)
東南大學學報(自然科學版)
동남대학학보(자연과학판)
JOURNAL OF SOUTHEAST UNIVERSITY
2014年
2期
261-265
,共5页
语义检索%多分类语义分析%词向量库%个性化算法
語義檢索%多分類語義分析%詞嚮量庫%箇性化算法
어의검색%다분류어의분석%사향량고%개성화산법
semantic search%multi-classification semantic analysis (MSA)%term vector database (TVDB )%personalization algorithm
为了进一步提升语义检索的精度和改善用户体验,提出了一种基于多分类语义分析和个性化的语义检索方法.首先,利用改进的多分类语义分析方法实现目标文档的向量化,并建立词向量库;然后,利用支持向量机对文档进行分类,并结合文档类别生成标签索引.在检索时,根据词向量库的引导,使用用户历史检索记录和个人信息优化检索结果.实验结果显示,基于该方法的系统的检索精度、平均DCG和nDCG指标值分别达到0.7,7.267和0.890,较基于Lucene方法和Yahoo Directory方法所得结果的均值分别高出31%,36%和19%.在时间复杂度上,每次检索的平均耗时为0.669 s,较Lucene方法仅增加了0.326 s.由此可见,该方法提高了检索的精度和综合相关度,且额外的时间消耗较少.
為瞭進一步提升語義檢索的精度和改善用戶體驗,提齣瞭一種基于多分類語義分析和箇性化的語義檢索方法.首先,利用改進的多分類語義分析方法實現目標文檔的嚮量化,併建立詞嚮量庫;然後,利用支持嚮量機對文檔進行分類,併結閤文檔類彆生成標籤索引.在檢索時,根據詞嚮量庫的引導,使用用戶歷史檢索記錄和箇人信息優化檢索結果.實驗結果顯示,基于該方法的繫統的檢索精度、平均DCG和nDCG指標值分彆達到0.7,7.267和0.890,較基于Lucene方法和Yahoo Directory方法所得結果的均值分彆高齣31%,36%和19%.在時間複雜度上,每次檢索的平均耗時為0.669 s,較Lucene方法僅增加瞭0.326 s.由此可見,該方法提高瞭檢索的精度和綜閤相關度,且額外的時間消耗較少.
위료진일보제승어의검색적정도화개선용호체험,제출료일충기우다분류어의분석화개성화적어의검색방법.수선,이용개진적다분류어의분석방법실현목표문당적향양화,병건립사향량고;연후,이용지지향량궤대문당진행분류,병결합문당유별생성표첨색인.재검색시,근거사향량고적인도,사용용호역사검색기록화개인신식우화검색결과.실험결과현시,기우해방법적계통적검색정도、평균DCG화nDCG지표치분별체도0.7,7.267화0.890,교기우Lucene방법화Yahoo Directory방법소득결과적균치분별고출31%,36%화19%.재시간복잡도상,매차검색적평균모시위0.669 s,교Lucene방법부증가료0.326 s.유차가견,해방법제고료검색적정도화종합상관도,차액외적시간소모교소.
To further enhance the accuracy of semantic search and improve the user experience,a novel approach for semantic search based on multi-classification semantic analysis (MSA)and per-sonalization is presented.First,documents are transformed into vectors and stored in term vector da-tabase (TVDB )by using the modified MSA method.Then,documents are classified by support vector machine(SVM)and wrote into index with categories.In the search process,users' search history and personal information are used to optimize the search results with the help of TVDB .The experiment results show that the average precision,the average discounted cumulative gain(DCG) and the average normalized discounted cumulative gain(nDCG)otained by using this approach are 0.7,7.267 and 0.890,respectively,which are 31%,36%and 19%higher than the average of the results calculated by the Lucene method and the Yahoo Directory method.And the time complexity per query is 0.669 s,which is only 0.326 s more than that by using the Lucene method.Therefore, this approach can improve the relevance and precision of semantic search with a rational time cost.