微型机与应用
微型機與應用
미형궤여응용
Microcomputer & its Applications
2015年
21期
81-84
,共4页
文本分类%粗糙集%属性约简
文本分類%粗糙集%屬性約簡
문본분류%조조집%속성약간
text classfication%rough set%attribute reduction
在文本分类中,特征空间维数可以达到数万维。使用信息度量的方法,如文档频率、信息增益、互信息等,对特征进行选择后的维数通常还是很大,降低阈值或减小最小特征数可能会降低分类效果。针对这个问题,提出基于粗糙集的二次属性约简。实验表明,该方法在有效降低特征维数的同时保证了分类效果。
在文本分類中,特徵空間維數可以達到數萬維。使用信息度量的方法,如文檔頻率、信息增益、互信息等,對特徵進行選擇後的維數通常還是很大,降低閾值或減小最小特徵數可能會降低分類效果。針對這箇問題,提齣基于粗糙集的二次屬性約簡。實驗錶明,該方法在有效降低特徵維數的同時保證瞭分類效果。
재문본분류중,특정공간유수가이체도수만유。사용신식도량적방법,여문당빈솔、신식증익、호신식등,대특정진행선택후적유수통상환시흔대,강저역치혹감소최소특정수가능회강저분류효과。침대저개문제,제출기우조조집적이차속성약간。실험표명,해방법재유효강저특정유수적동시보증료분류효과。
Feature space dimension can reach tens of thousands in text auto classification. Dimension is still large after feature selection using the method of information measure such as document frequency , information gain and mutual information. Reducing the threshold or the minimum number of selected features may result in classification performance degradation. The solution for this situation is implemented with the attribute reduction again based on rough set theory. Experiment indicates that this method can effectively reduce the feature dimension, as well as ensure the performance of classification.