柳州师专学报
柳州師專學報
류주사전학보
JOURNAL OF LIUZHOU TEACHERS COLLEGE
2011年
4期
128-131
,共4页
文本分类%特征项%权重计算%改进方法
文本分類%特徵項%權重計算%改進方法
문본분류%특정항%권중계산%개진방법
text classification%feature item%calculating the feature weight%improve method
TFID作为文本特征权重计算常用方法,其不足之处是忽略了特征词在文本中的分布情况和文本长度。修正特征词后的改进TFIDF算法召回率和准确率都优于改进前TFIDF。
TFID作為文本特徵權重計算常用方法,其不足之處是忽略瞭特徵詞在文本中的分佈情況和文本長度。脩正特徵詞後的改進TFIDF算法召迴率和準確率都優于改進前TFIDF。
TFID작위문본특정권중계산상용방법,기불족지처시홀략료특정사재문본중적분포정황화문본장도。수정특정사후적개진TFIDF산법소회솔화준학솔도우우개진전TFIDF。
TFIDF(Term Frequency Inverse Documentation Frequency) is the main method of calculating the feature weight in text classification research,which ignores the distribution of feature words in text and the length of the text.To solve the problem,this paper proposes the N-TFIDF algorithm to amend the weight calculation of the feature words and proves its validation by using the Classifier.The result shows that the N-TFIDF method of recall and precision rates are better than TFIDF.