计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2015年
6期
1514-1518,1534
,共6页
黄贤英%张金鹏%刘英涛%赵明军
黃賢英%張金鵬%劉英濤%趙明軍
황현영%장금붕%류영도%조명군
How Net语义词典%词性向量%语义维度映射%词项词频%短文本相似度算法
How Net語義詞典%詞性嚮量%語義維度映射%詞項詞頻%短文本相似度算法
How Net어의사전%사성향량%어의유도영사%사항사빈%단문본상사도산법
HowNet semantic library%part of speech vector%semantic space mapping%term frequency%short text similarity al-gorithm
针对How Net语义词典对词项收录数量的有限性在一定程度上制约文本相似度运算准确性的问题,提出一种词项语义维度映射的方法。从词项词性的角度出发,按词性对短文本中词项进行切分,按词性特征对短文本之间进行词项归并,构建词性向量,依据词频和 How Net语义词典,词项完成词性向量中权值映射,将短文本之间相似度运算转换为词性向量之间相似度运算。将该算法运用于信箱测试数据集,实验结果表明,该算法提高了文本相似度运算的准确率和相似度平均值。
針對How Net語義詞典對詞項收錄數量的有限性在一定程度上製約文本相似度運算準確性的問題,提齣一種詞項語義維度映射的方法。從詞項詞性的角度齣髮,按詞性對短文本中詞項進行切分,按詞性特徵對短文本之間進行詞項歸併,構建詞性嚮量,依據詞頻和 How Net語義詞典,詞項完成詞性嚮量中權值映射,將短文本之間相似度運算轉換為詞性嚮量之間相似度運算。將該算法運用于信箱測試數據集,實驗結果錶明,該算法提高瞭文本相似度運算的準確率和相似度平均值。
침대How Net어의사전대사항수록수량적유한성재일정정도상제약문본상사도운산준학성적문제,제출일충사항어의유도영사적방법。종사항사성적각도출발,안사성대단문본중사항진행절분,안사성특정대단문본지간진행사항귀병,구건사성향량,의거사빈화 How Net어의사전,사항완성사성향량중권치영사,장단문본지간상사도운산전환위사성향량지간상사도운산。장해산법운용우신상측시수거집,실험결과표명,해산법제고료문본상사도운산적준학솔화상사도평균치。
The accuracy of text similarity calculation is greatly restricted because of the limited amount of terms included in How‐Net semantic library .A way of term mapping with semantic space was proposed ,where short text was divided into several terms according to part of speech ,all the terms in both of the texts were merged together to constitute part of speech vector ,the map‐ping weight of each term in the part of speech vector was acquired according to term frequency and HowNet semantic library ,the similarity calculation between short texts was turned into the similarity calculation between part of speech vectors .The results of experiment on an open benchmark dataset of the mail show the proposed algorithm improves the accuracy and average similarity value compared with the traditional algorithm .