计算机研究与发展
計算機研究與髮展
계산궤연구여발전
JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT
2015年
7期
1499-1509
,共11页
单词语义相似度%语义相似度%分段线性插值%朴素贝叶斯模型%WordNet
單詞語義相似度%語義相似度%分段線性插值%樸素貝葉斯模型%WordNet
단사어의상사도%어의상사도%분단선성삽치%박소패협사모형%WordNet
word semantic similarity%semantic similarity%piecewise linear interpolation%Na?ve Bayes model%WordNet
单词语义相似度度量是自然语言处理领域的经典和热点问题.通过结合朴素贝叶斯模型和知识库,提出一个新颖的度量单词语义相似度度量途径.首先借助通用本体 WordNet 获取属性变量,然后使用统计和分段线性插值生成条件概率分布列,继而通过贝叶斯推理实现信息融合获得后验概率,并在此基础上量化单词语义相似度.主要贡献是定义了单词对距离和深度,并将朴素贝叶斯模型用于单词语义相似度度量.在基准数据集 R&G(65)上,对比算法评判结果与人类评判结果的相关度,采用5折交叉验证对算法进行分析,样本 Pearson 相关度达到0.912,比当前最优方法高出0.4%,比经典算法高出7%~13%;Spearman 相关度达到0.873,比经典算法高出10%~20%;且算法的运行效率和经典算法相当.实验结果显示将朴素贝叶斯模型和知识库相结合解决单词语义相似度问题是合理有效的.
單詞語義相似度度量是自然語言處理領域的經典和熱點問題.通過結閤樸素貝葉斯模型和知識庫,提齣一箇新穎的度量單詞語義相似度度量途徑.首先藉助通用本體 WordNet 穫取屬性變量,然後使用統計和分段線性插值生成條件概率分佈列,繼而通過貝葉斯推理實現信息融閤穫得後驗概率,併在此基礎上量化單詞語義相似度.主要貢獻是定義瞭單詞對距離和深度,併將樸素貝葉斯模型用于單詞語義相似度度量.在基準數據集 R&G(65)上,對比算法評判結果與人類評判結果的相關度,採用5摺交扠驗證對算法進行分析,樣本 Pearson 相關度達到0.912,比噹前最優方法高齣0.4%,比經典算法高齣7%~13%;Spearman 相關度達到0.873,比經典算法高齣10%~20%;且算法的運行效率和經典算法相噹.實驗結果顯示將樸素貝葉斯模型和知識庫相結閤解決單詞語義相似度問題是閤理有效的.
단사어의상사도도량시자연어언처리영역적경전화열점문제.통과결합박소패협사모형화지식고,제출일개신영적도량단사어의상사도도량도경.수선차조통용본체 WordNet 획취속성변량,연후사용통계화분단선성삽치생성조건개솔분포렬,계이통과패협사추리실현신식융합획득후험개솔,병재차기출상양화단사어의상사도.주요공헌시정의료단사대거리화심도,병장박소패협사모형용우단사어의상사도도량.재기준수거집 R&G(65)상,대비산법평판결과여인류평판결과적상관도,채용5절교차험증대산법진행분석,양본 Pearson 상관도체도0.912,비당전최우방법고출0.4%,비경전산법고출7%~13%;Spearman 상관도체도0.873,비경전산법고출10%~20%;차산법적운행효솔화경전산법상당.실험결과현시장박소패협사모형화지식고상결합해결단사어의상사도문제시합리유효적.
Measuring semantic similarity between words is a classical and hot problem in nature language processing ,the achievement of which has great impact on many applications such as word sense disambiguation , machine translation , ontology mapping , computational linguistics , etc . A novel approach is proposed to measure words semantic similarity by combining Na?ve Bayes model with knowledge base . To start , extract attribute variables based on WordNet ; then , generate conditional probability distribution by statistics and piecewise linear interpolation technique ; after that ,obtain posteriori through Bayesian inference ;at last ,quantify word semantic similarity .The main contributions are definition of distance and depth between word pairs with small amount of computation and high degree of distinguishing the characteristics from words’ sense , and word semantic similarity measurement based on na?ve Bayesian model .On benchmark data set R&G (65) , the experiment is conducted through 5‐fold cross validation .The sample Pearson correlation between test results and human judgments is 0 .912 ,with 0 .4% improvement over existing best practice ,and 7% ~ 13% improvement over classical methods .Spearman correlation between test results and human judgments is 0.873 ,with 10% ~ 20% improvement over classical methods .And the computational complexity of the method is as efficient as the classical methods ,which indicates that integrating Na?ve Bayes model with knowledge base to measure word semantic similarity is reasonable and effective .