计算机应用与软件
計算機應用與軟件
계산궤응용여연건
COMPUTER APPLICATIONS AND SOFTWARE
2013年
10期
243-245
,共3页
文本分类%特征提取%平均词频%类间集中度
文本分類%特徵提取%平均詞頻%類間集中度
문본분류%특정제취%평균사빈%류간집중도
Text classification%Feature extraction%Average word frequency%Concentration between classes
文本分类中特征提取对分类效果有较大的影响,传统的特征提取方法在特征分布信息的量化方面存在不足。为此,提出一种基于特征词类内、类外平均词频的特征提取算法。算法通过特征词的平均词频类间集中度和文档频类间集中度来计算特征词的权重,能够更准确地反映特征词的分布情况。通过实验结果比较,可以证明,该算法有效地提高了分类效果。
文本分類中特徵提取對分類效果有較大的影響,傳統的特徵提取方法在特徵分佈信息的量化方麵存在不足。為此,提齣一種基于特徵詞類內、類外平均詞頻的特徵提取算法。算法通過特徵詞的平均詞頻類間集中度和文檔頻類間集中度來計算特徵詞的權重,能夠更準確地反映特徵詞的分佈情況。通過實驗結果比較,可以證明,該算法有效地提高瞭分類效果。
문본분류중특정제취대분류효과유교대적영향,전통적특정제취방법재특정분포신식적양화방면존재불족。위차,제출일충기우특정사류내、류외평균사빈적특정제취산법。산법통과특정사적평균사빈류간집중도화문당빈류간집중도래계산특정사적권중,능구경준학지반영특정사적분포정황。통과실험결과비교,가이증명,해산법유효지제고료분류효과。
The feature extraction has a greater impact on text classification results.Traditional feature extraction methods have deficiencies in quantification of feature distribution information.Therefore,a feature extraction algorithm based on the average word frequency of feature words inside and outside the classes is proposed.The algorithm calculates the weights of the feature words by their between-class concentration degree of average word frequency and of document frequency,this can more accurately reflect the distribution of feature words.The results comparison through experiment can prove that the algorithm effectively improves the classification results.