计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
6期
127-131
,共5页
董丽丽%李欢%张翔%刘闫锋
董麗麗%李歡%張翔%劉閆鋒
동려려%리환%장상%류염봉
领域概念获取%改进近邻传播算法%对数似然比%语义相似度%互信息
領域概唸穫取%改進近鄰傳播算法%對數似然比%語義相似度%互信息
영역개념획취%개진근린전파산법%대수사연비%어의상사도%호신식
domain concept extraction%improved affinity propagation%log-likelihood%semantic similarity%mutual information
针对统计学方法在领域概念获取时缺少词语语义信息的问题,提出了一种结合语义相似度和改进近邻传播算法的领域概念自动获取方法。该方法通过互信息进行合成词提取,使用对数似然比避免对低频词的遗漏,利用HowNet和余弦相似度识别术语间同义词,采用改进的近邻传播算法获取领域概念集合。实验结果表明,该方法在准确率、召回率和困惑度变化率上比传统的方法都有较大提高。
針對統計學方法在領域概唸穫取時缺少詞語語義信息的問題,提齣瞭一種結閤語義相似度和改進近鄰傳播算法的領域概唸自動穫取方法。該方法通過互信息進行閤成詞提取,使用對數似然比避免對低頻詞的遺漏,利用HowNet和餘絃相似度識彆術語間同義詞,採用改進的近鄰傳播算法穫取領域概唸集閤。實驗結果錶明,該方法在準確率、召迴率和睏惑度變化率上比傳統的方法都有較大提高。
침대통계학방법재영역개념획취시결소사어어의신식적문제,제출료일충결합어의상사도화개진근린전파산법적영역개념자동획취방법。해방법통과호신식진행합성사제취,사용대수사연비피면대저빈사적유루,이용HowNet화여현상사도식별술어간동의사,채용개진적근린전파산법획취영역개념집합。실험결과표명,해방법재준학솔、소회솔화곤혹도변화솔상비전통적방법도유교대제고。
For statistical method lacks semantic information between words in domain concepts extraction, this paper presents a domain concept automatic extraction method, which combines semantic similarity and improved affinity propagation. The compound words are extracted by using mutual information, and then the log-likelihood is used to avoid the omission of low-frequency words, after that the synonyms between terms are identified by using HowNet and the cosine similarity. The improved affinity propagation algorithm is used to obtain the collection of domain concepts. The experimental results show that the method has higher accuracy, recall rate, and perplexity change ratio than the traditional method.