计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2013年
22期
110-114
,共5页
文本分类%特征选择%χ2统计%类内分布%类间分布
文本分類%特徵選擇%χ2統計%類內分佈%類間分佈
문본분류%특정선택%χ2통계%류내분포%류간분포
text categorization%feature selection%Chi-square%distribution within class%distribution between class
CHI是一种常用的文本特征选择方法。针对该模型的不足之处,以特征项的频数为依据,分别从特征项的类内分布、类间分布以及类内不同文本之间分布等角度,对CHI模型进行逐步优化,使得特征项频数信息得到了有效利用。提出了一种基于词频信息的改进CHI模型。随后的文本分类试验证明了提出优化CHI模型的有效性。
CHI是一種常用的文本特徵選擇方法。針對該模型的不足之處,以特徵項的頻數為依據,分彆從特徵項的類內分佈、類間分佈以及類內不同文本之間分佈等角度,對CHI模型進行逐步優化,使得特徵項頻數信息得到瞭有效利用。提齣瞭一種基于詞頻信息的改進CHI模型。隨後的文本分類試驗證明瞭提齣優化CHI模型的有效性。
CHI시일충상용적문본특정선택방법。침대해모형적불족지처,이특정항적빈수위의거,분별종특정항적류내분포、류간분포이급류내불동문본지간분포등각도,대CHI모형진행축보우화,사득특정항빈수신식득도료유효이용。제출료일충기우사빈신식적개진CHI모형。수후적문본분류시험증명료제출우화CHI모형적유효성。
CHI is a commonly used text feature selection method. Aiming at the shortcomings of the model, according to the fre-quency characteristic, the CHI model is gradually optimized from the feature distribution within class, distribution between class and the distribution between different text in the same category. This approach makes the characteristic frequency information has been used effectively. An improved CHI model based on word frequency information is proposed. The text categorization ex-periment subsequently proves the validity of the new optimized CHI model.