CAJ | 학술논문

在文本分类中，特征空间维数可以达到数万维。使用信息度量的方法，如文档频率、信息增益、互信息等，对特征进行选择后的维数通常还是很大，降低阈值或减小最小特征数可能会降低分类效果。针对这个问题，提出基于粗糙集的二次属性约简。实验表明，该方法在有效降低特征维数的同时保证了分类效果。
재문본분류중，특정공간유수가이체도수만유。사용신식도량적방법，여문당빈솔、신식증익、호신식등，대특정진행선택후적유수통상환시흔대，강저역치혹감소최소특정수가능회강저분류효과。침대저개문제，제출기우조조집적이차속성약간。실험표명，해방법재유효강저특정유수적동시보증료분류효과。
Feature space dimension can reach tens of thousands in text auto classification. Dimension is still large after feature selection using the method of information measure such as document frequency , information gain and mutual information. Reducing the threshold or the minimum number of selected features may result in classification performance degradation. The solution for this situation is implemented with the attribute reduction again based on rough set theory. Experiment indicates that this method can effectively reduce the feature dimension, as well as ensure the performance of classification.