中国科技论文
中國科技論文
중국과기논문
China Sciencepaper
2015年
20期
2386-2389
,共4页
计算机应用%特征选择%互信息%相对词频因子%类差分度
計算機應用%特徵選擇%互信息%相對詞頻因子%類差分度
계산궤응용%특정선택%호신식%상대사빈인자%류차분도
computer application%feature selection%mutual information%relative term frequency factor%difference factor among classes
通过引入类差分度,提出一种改进的互信息特征选择方法,并同时引入相对词频因子解决传统方法倾向于选择低频词的不足,合理地改善了特征选择的准确率,提高分类的精度和效率。文本分类实验结果表明,所提出方法的平均查全率和平均查准率分别提高了11.26%和8.04%,综合评价指标平均 F 1值提高了18.55%。
通過引入類差分度,提齣一種改進的互信息特徵選擇方法,併同時引入相對詞頻因子解決傳統方法傾嚮于選擇低頻詞的不足,閤理地改善瞭特徵選擇的準確率,提高分類的精度和效率。文本分類實驗結果錶明,所提齣方法的平均查全率和平均查準率分彆提高瞭11.26%和8.04%,綜閤評價指標平均 F 1值提高瞭18.55%。
통과인입류차분도,제출일충개진적호신식특정선택방법,병동시인입상대사빈인자해결전통방법경향우선택저빈사적불족,합리지개선료특정선택적준학솔,제고분류적정도화효솔。문본분류실험결과표명,소제출방법적평균사전솔화평균사준솔분별제고료11.26%화8.04%,종합평개지표평균 F 1치제고료18.55%。
An improved mutual information feature selection method is proposed by introducing difference degree among classes. Meanwhile,relatively term frequency factor is applied to solve the traditional methods tend to choose low-frequency words.This method could improve the accuracy of feature selection,and increase the accuracy and efficiency of classification.Text classifica-tion experimental results show that the average recall rate and precision rate of the proposed method increase by 11.26% and 8.04%,the average F 1 increases by 18.55%.