计算机应用研究
計算機應用研究
계산궤응용연구
APPLICATION RESEARCH OF COMPUTERS
2014年
11期
3299-3303
,共5页
陈睿%张亮%杨静%胡荣贵
陳睿%張亮%楊靜%鬍榮貴
진예%장량%양정%호영귀
不均衡数据集%边界少数类样本合成过抽样技术%逆转欠抽样技术%多分类器集成
不均衡數據集%邊界少數類樣本閤成過抽樣技術%逆轉欠抽樣技術%多分類器集成
불균형수거집%변계소수류양본합성과추양기술%역전흠추양기술%다분류기집성
imbalanced dataset%BSMOTE%inverse under sampling%multiple classifier ensemble
针对传统分类器在数据不均衡的情况下分类效果不理想的缺陷,为提高分类器在不均衡数据集下的分类性能,特别是少数类样本的分类能力,提出了一种基于BSMOTE 和逆转欠抽样的不均衡数据分类算法。该算法使用BSMOTE进行过抽样,人工增加少数类样本的数量,然后通过优先去除样本中的冗余和噪声样本,使用逆转欠抽样方法逆转少数类样本和多数类样本的比例。通过多次进行上述抽样形成多个训练集合,使用Bagging方法集成在多个训练集合上获得的分类器来提高有效信息的利用率。实验表明,该算法较几种现有算法不仅能够提高少数类样本的分类性能,而且能够有效提高整体分类准确度。
針對傳統分類器在數據不均衡的情況下分類效果不理想的缺陷,為提高分類器在不均衡數據集下的分類性能,特彆是少數類樣本的分類能力,提齣瞭一種基于BSMOTE 和逆轉欠抽樣的不均衡數據分類算法。該算法使用BSMOTE進行過抽樣,人工增加少數類樣本的數量,然後通過優先去除樣本中的冗餘和譟聲樣本,使用逆轉欠抽樣方法逆轉少數類樣本和多數類樣本的比例。通過多次進行上述抽樣形成多箇訓練集閤,使用Bagging方法集成在多箇訓練集閤上穫得的分類器來提高有效信息的利用率。實驗錶明,該算法較幾種現有算法不僅能夠提高少數類樣本的分類性能,而且能夠有效提高整體分類準確度。
침대전통분류기재수거불균형적정황하분류효과불이상적결함,위제고분류기재불균형수거집하적분류성능,특별시소수류양본적분류능력,제출료일충기우BSMOTE 화역전흠추양적불균형수거분류산법。해산법사용BSMOTE진행과추양,인공증가소수류양본적수량,연후통과우선거제양본중적용여화조성양본,사용역전흠추양방법역전소수류양본화다수류양본적비례。통과다차진행상술추양형성다개훈련집합,사용Bagging방법집성재다개훈련집합상획득적분류기래제고유효신식적이용솔。실험표명,해산법교궤충현유산법불부능구제고소수류양본적분류성능,이차능구유효제고정체분류준학도。
The result of classical classification algorithms in the case of imbalanced data sets is not satisfactory.In order to im-prove the classification performance under imbalanced data sets,especially the classification ability of the minority class,this pa-per presented a novel classification algorithm for imbalanced data sets based on combination of border synthetic minority oversam-pling technique (BSMOTE)and inverse under sampling.It used BSMOTE to increase the sample number of minority class,and then used a inverse under sampling method to inverse the cardinalities of the majority and minority class ratio through removing the samples of redundant and noise sample firstly.By sampling several times,it created a large number of distinct training sets.It used Bagging method to ensemble the classifiers trained on those data sets to improve the efficient use of the original data sets.Ex-perimental results show that the proposed algorithm can not only improve classification performance in the minority class data,but also increase the overall classification accuracy rate effectively than several existing algorithms.