模式识别与人工智能
模式識彆與人工智能
모식식별여인공지능
Moshi Shibie yu Rengong Zhineng
2009年
5期
709-717
,共9页
郝秀兰%陶晓鹏%王述云%徐和祥%胡运发
郝秀蘭%陶曉鵬%王述雲%徐和祥%鬍運髮
학수란%도효붕%왕술운%서화상%호운발
文本分类%k-近邻(kNN)%取样%特征选择%Condensing算法
文本分類%k-近鄰(kNN)%取樣%特徵選擇%Condensing算法
문본분류%k-근린(kNN)%취양%특정선택%Condensing산법
Text Categorization%k -Nearest Neighbor ( kNN )%Sampling%Feature Selection%Condensing Algorithm
作为一种基于实例的方法,k-近邻(kNN)分类器有大量的计算及存储需求.同时,训练数据分布的不均衡,也会导致kNN分类器的性能下降.针对这些缺陷,文中提出特征选择与Condensing技术相结合的取样方法,以达到下述目的.在减少kNN分类的计算量及存储量的同时,保证分类器的性能.首先由传统的特征选择方法产生训练集里每类训练数据的特征.再根据文档自身的类特征,结合Condensing策略移去多余的训练实例.大量实验表明,用该方法所取得的样本作为训练集,不仅极大减少kNN方法的时空开销,而且降低噪声,提高分类器性能.
作為一種基于實例的方法,k-近鄰(kNN)分類器有大量的計算及存儲需求.同時,訓練數據分佈的不均衡,也會導緻kNN分類器的性能下降.針對這些缺陷,文中提齣特徵選擇與Condensing技術相結閤的取樣方法,以達到下述目的.在減少kNN分類的計算量及存儲量的同時,保證分類器的性能.首先由傳統的特徵選擇方法產生訓練集裏每類訓練數據的特徵.再根據文檔自身的類特徵,結閤Condensing策略移去多餘的訓練實例.大量實驗錶明,用該方法所取得的樣本作為訓練集,不僅極大減少kNN方法的時空開銷,而且降低譟聲,提高分類器性能.
작위일충기우실례적방법,k-근린(kNN)분류기유대량적계산급존저수구.동시,훈련수거분포적불균형,야회도치kNN분류기적성능하강.침대저사결함,문중제출특정선택여Condensing기술상결합적취양방법,이체도하술목적.재감소kNN분류적계산량급존저량적동시,보증분류기적성능.수선유전통적특정선택방법산생훈련집리매류훈련수거적특정.재근거문당자신적류특정,결합Condensing책략이거다여적훈련실례.대량실험표명,용해방법소취득적양본작위훈련집,불부겁대감소kNN방법적시공개소,이차강저조성,제고분류기성능.
As an instance based classifier, kNN has many computational and store requirements. Meanwhile, the poor performance of kNN classifier is caused by the imbalance distribution of training data. Aiming at these defects of kNN classifier, a technique, combining feature selection and condensing, is proposed to reduce the time cost and the space of kNN classifier. The proposed algorithm is divided into two steps. Firstly, several traditional methods of feature selection are combined to form features for each class. Then, redundant cases are removed by combination of class features contained in samples with Condensing algorithm. Experimental results indicate when the sample set acquired by the proposed method is used as training set, the classifier saves the time cost and the space dramatically, and the performance of the kNN classifier is improved because noisy data are removed from the training set.