计算机与数字工程
計算機與數字工程
계산궤여수자공정
COMPUTER & DIGITAL ENGINEERING
2014年
11期
2066-2068,2163
,共4页
文必龙%李乃峰%任秀英%冯翔%吕鹏全
文必龍%李迺峰%任秀英%馮翔%呂鵬全
문필룡%리내봉%임수영%풍상%려붕전
文本特征%词频统计%本体概念相似度%共现特征
文本特徵%詞頻統計%本體概唸相似度%共現特徵
문본특정%사빈통계%본체개념상사도%공현특정
text feature%word frequency statistics%similarity of ontology concepts%co-ocurrence features
针对基于词频统计的T D‐ID F文本特征提取方法缺乏对文本中概念关系处理,而使提取到的文本特征具有概念冗余、特征不明确等问题,提出基于本体概念相似度的词频统计方法。利用文本元素之间的语义相似度调整特征元素的词频,突出特征元素的语义贡献、消除特征冗余,增强特征集合元素的特征独立性。最后结合文本概念的共现特性,对可能出现某些重要特征元素因词频统计而被忽略的问题进行处理,从而准确、高效地提取文本特征。
針對基于詞頻統計的T D‐ID F文本特徵提取方法缺乏對文本中概唸關繫處理,而使提取到的文本特徵具有概唸冗餘、特徵不明確等問題,提齣基于本體概唸相似度的詞頻統計方法。利用文本元素之間的語義相似度調整特徵元素的詞頻,突齣特徵元素的語義貢獻、消除特徵冗餘,增彊特徵集閤元素的特徵獨立性。最後結閤文本概唸的共現特性,對可能齣現某些重要特徵元素因詞頻統計而被忽略的問題進行處理,從而準確、高效地提取文本特徵。
침대기우사빈통계적T D‐ID F문본특정제취방법결핍대문본중개념관계처리,이사제취도적문본특정구유개념용여、특정불명학등문제,제출기우본체개념상사도적사빈통계방법。이용문본원소지간적어의상사도조정특정원소적사빈,돌출특정원소적어의공헌、소제특정용여,증강특정집합원소적특정독립성。최후결합문본개념적공현특성,대가능출현모사중요특정원소인사빈통계이피홀략적문제진행처리,종이준학、고효지제취문본특정。
Owing to the problem that the method that TF‐IDF text feature extraction based on word frequency statistic lacks the concept relations in the text ,there are some problems in the text feature extraction ,such as the redundancy of con‐cept and unclear feature .The method of the word frequency statistics based on similarity of ontology concepts is introduced . The frequency of feature element using semantic similarity between text elements is applied .It emphasizes the semantic con‐tribution of feature element ,eliminating redundancy of feature ,and enhancing independence of the elements of the features collection .Finally ,combined with the co‐occurrence characteristics of the concepts of the text ,it accomplishes to deal with ignored problems that some important feature elements through word frequency statistics lead to ignoring .Consequently ,it achieves the goal that it can extract text accurately and efficiently .