吉林大学学报(理学版)
吉林大學學報(理學版)
길림대학학보(이학판)
JOURNAL OF JILIN UNIVERSITY(SCIENCE EDITION)
2014年
6期
1230-1238
,共9页
吴瑞红%吕学强%李卓%舒燕
吳瑞紅%呂學彊%李卓%舒燕
오서홍%려학강%리탁%서연
多字词表达%问句理解%互信息%搜索引擎
多字詞錶達%問句理解%互信息%搜索引擎
다자사표체%문구리해%호신식%수색인경
multiword expressions%question interpretation%mutual information%search engine
基于互动问答社区问句中多字词表达和问句理解的关系,提出针对互动问答社区问句进行多字词表达抽取,并基于互动问答社区问句中多字词表达的特点,提出适用于互动问答社区的多字词表达提取方法.该方法在利用互信息和停用词表的方法从问句中抽取候选多字词表达的基础上,将候选多字词表达分为正确串、残缺串、冗余串和错误串4类,借助搜索引擎对查询串的优化和候选多字词表达在互联网上的检索结果,设计候选多字词表达校正方法,实现对多字词表达的提取.以新浪爱问知识人问题库中的问句进行实验,结果表明,多字词表达抽取的准确率、召回率和 F 值分别达到84%,52%和0.64,验证了该方法的有效性.
基于互動問答社區問句中多字詞錶達和問句理解的關繫,提齣針對互動問答社區問句進行多字詞錶達抽取,併基于互動問答社區問句中多字詞錶達的特點,提齣適用于互動問答社區的多字詞錶達提取方法.該方法在利用互信息和停用詞錶的方法從問句中抽取候選多字詞錶達的基礎上,將候選多字詞錶達分為正確串、殘缺串、冗餘串和錯誤串4類,藉助搜索引擎對查詢串的優化和候選多字詞錶達在互聯網上的檢索結果,設計候選多字詞錶達校正方法,實現對多字詞錶達的提取.以新浪愛問知識人問題庫中的問句進行實驗,結果錶明,多字詞錶達抽取的準確率、召迴率和 F 值分彆達到84%,52%和0.64,驗證瞭該方法的有效性.
기우호동문답사구문구중다자사표체화문구리해적관계,제출침대호동문답사구문구진행다자사표체추취,병기우호동문답사구문구중다자사표체적특점,제출괄용우호동문답사구적다자사표체제취방법.해방법재이용호신식화정용사표적방법종문구중추취후선다자사표체적기출상,장후선다자사표체분위정학천、잔결천、용여천화착오천4류,차조수색인경대사순천적우화화후선다자사표체재호련망상적검색결과,설계후선다자사표체교정방법,실현대다자사표체적제취.이신랑애문지식인문제고중적문구진행실험,결과표명,다자사표체추취적준학솔、소회솔화 F 치분별체도84%,52%화0.64,험증료해방법적유효성.
The multiword expressions (MWEs)in the questions of question answering communities have direct relationship with question interpretation.We first proposed the idea of extracting MWEs from the questions of question answering communities.According to the characteristics of multiword expressions in the questions,we proposed a method of extracting MWEs in questions of question answering communities.In this method,we first used mutual information method and stop words filtering method to get the candidate MWEs.Then we classified the candidate MWEs into four types:right string,incomplete string,redundancy string and error string.At last,with the help of query optimization in search engines and the candidate MWEs retrieval results on the internet,we designed a revising method to get the MWEs.We took the questions in Sina iask question library as the experimental corpus.And the results show that the precision,recall and the F-measure can reach 84%,52%,0.64 respectively,which proves the effectiveness of the proposed method.