计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2009年
22期
53-55
,共3页
中文文本分类%特征选择%特征权重%分类算法
中文文本分類%特徵選擇%特徵權重%分類算法
중문문본분류%특정선택%특정권중%분류산법
Chinese text categorization%feature selection%feature weighting%classification algorithm
为了提高中文文本分类的效率与精度,设计了一种新型的分类器.该分类器采用基于词频、互信息和类别信息的综合评估函数进行选择特征;在特征权重计算上,由于传统TF-IDF方法没有考虑特征类间和类内分布,提出了一种将词频和综合评估函数值相结合的权重计算方法;最后设计了一种基于贝叶斯原理的快速分类器.实验证明该分类器简单有效.
為瞭提高中文文本分類的效率與精度,設計瞭一種新型的分類器.該分類器採用基于詞頻、互信息和類彆信息的綜閤評估函數進行選擇特徵;在特徵權重計算上,由于傳統TF-IDF方法沒有攷慮特徵類間和類內分佈,提齣瞭一種將詞頻和綜閤評估函數值相結閤的權重計算方法;最後設計瞭一種基于貝葉斯原理的快速分類器.實驗證明該分類器簡單有效.
위료제고중문문본분류적효솔여정도,설계료일충신형적분류기.해분류기채용기우사빈、호신식화유별신식적종합평고함수진행선택특정;재특정권중계산상,유우전통TF-IDF방법몰유고필특정류간화류내분포,제출료일충장사빈화종합평고함수치상결합적권중계산방법;최후설계료일충기우패협사원리적쾌속분류기.실험증명해분류기간단유효.
For improving the efficiency and accuracy of Chinese text eategnrization,this paper presents a new Chinese text classier,in which a novel feature selection is proposed according to word frequency,mutual information and classificatory information,and after analyzing the hypostasis of the traditional TF-IDF,a weight adjustment method is put forward in which the IDF function is replaced by function used in feature selection.Finally a fast Bayes theory classier is designed.Experiments prove this classier is simple and effective.