计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2015年
4期
187-189,194
,共4页
查询扩展%局部共现分析%点互信息算法%扩展词%大规模语料库
查詢擴展%跼部共現分析%點互信息算法%擴展詞%大規模語料庫
사순확전%국부공현분석%점호신식산법%확전사%대규모어료고
query expansion%local co-occurrence analysis%point mutual information algorithm%expansion word%large scale corpus
为提高维吾尔文网络内容查询的扩展性能,提出一种将维语同义词和互联网资源相结合的扩展词构建算法。利用维吾尔语同义词词典、近义词词典和反义词词典等建立基本候选词库,将互联网作为超大规模语料库,以搜索引擎为工具,使用改进的点互信息对基本扩展词进行相似度评价,选取前N个词形成候选扩展词库1,对包含关键词的互联网语料,基于局部共现和点互信息分析,构建候选扩展词库2,对上述2 种候选扩展词库加权求和,按顺序选择部分词为扩展词。通过搜索引擎实现扩展查询验证,结果表明,与常规查询和同义词查询扩展算法相比,该算法能明显提高查询的准确率。
為提高維吾爾文網絡內容查詢的擴展性能,提齣一種將維語同義詞和互聯網資源相結閤的擴展詞構建算法。利用維吾爾語同義詞詞典、近義詞詞典和反義詞詞典等建立基本候選詞庫,將互聯網作為超大規模語料庫,以搜索引擎為工具,使用改進的點互信息對基本擴展詞進行相似度評價,選取前N箇詞形成候選擴展詞庫1,對包含關鍵詞的互聯網語料,基于跼部共現和點互信息分析,構建候選擴展詞庫2,對上述2 種候選擴展詞庫加權求和,按順序選擇部分詞為擴展詞。通過搜索引擎實現擴展查詢驗證,結果錶明,與常規查詢和同義詞查詢擴展算法相比,該算法能明顯提高查詢的準確率。
위제고유오이문망락내용사순적확전성능,제출일충장유어동의사화호련망자원상결합적확전사구건산법。이용유오이어동의사사전、근의사사전화반의사사전등건립기본후선사고,장호련망작위초대규모어료고,이수색인경위공구,사용개진적점호신식대기본확전사진행상사도평개,선취전N개사형성후선확전사고1,대포함관건사적호련망어료,기우국부공현화점호신식분석,구건후선확전사고2,대상술2 충후선확전사고가권구화,안순서선택부분사위확전사。통과수색인경실현확전사순험증,결과표명,여상규사순화동의사사순확전산법상비,해산법능명현제고사순적준학솔。
In order to improve the performance of Uighur network content query expansion,this paper presents a kind of expansion words construction algorithm that is based on the combination of the Uygur synonym resources and Internet resources. An initial candidate words set is created by the Uyghur synonym,near-synonym and antonyms dictionary. The Internet is acted as a very large scale corpus,the similarity between the keywords and every word in the initial candidate words set is computed by the improved point mutual information algorithm. The words are sorted by the similarity evaluation and the top N words are selected to the candidate expansion words set-1 . Meanwhile based on the partial collinear and point mutual information algorithm,it analyzes the Internet corpus which contained keywords and create the candidate expansion words set-2 . The final expansion words are selected according to the results of weighted summation for the candidate expansion words set-1 and set-2 . Compared with the normal keywords query and synonym expansion query ,the query results based on the expansion words in this paper show that the accuracy of this algorithm is much better than the others.