计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
6期
92-95
,共4页
刘余霞%刘三民%刘涛%王忠群
劉餘霞%劉三民%劉濤%王忠群
류여하%류삼민%류도%왕충군
非平衡数据学习%过采样%数据分类
非平衡數據學習%過採樣%數據分類
비평형수거학습%과채양%수거분류
imbalanced data learning%oversampling%data classification
针对非平衡数据集中类分布信息不对称现象,提出一种新的过采样算法DB_SMOTE(Distance-based Syn-thetic Minority Over-sampling Technique),通过合成少数类新样本解决样本不足问题。算法基于样本与类中心距离,结合类聚集程度提取种子样本。根据SMOTE(Synthetic Minority Over-sampling Technique)算法思想,在种子样本上实现少数类新样本合成。根据种子样本与少数类中心距离构造新样本分布函数。基于此采样算法并在多个数据集上进行分类实验,结果表明DB_SMOTE算法是可行的。
針對非平衡數據集中類分佈信息不對稱現象,提齣一種新的過採樣算法DB_SMOTE(Distance-based Syn-thetic Minority Over-sampling Technique),通過閤成少數類新樣本解決樣本不足問題。算法基于樣本與類中心距離,結閤類聚集程度提取種子樣本。根據SMOTE(Synthetic Minority Over-sampling Technique)算法思想,在種子樣本上實現少數類新樣本閤成。根據種子樣本與少數類中心距離構造新樣本分佈函數。基于此採樣算法併在多箇數據集上進行分類實驗,結果錶明DB_SMOTE算法是可行的。
침대비평형수거집중류분포신식불대칭현상,제출일충신적과채양산법DB_SMOTE(Distance-based Syn-thetic Minority Over-sampling Technique),통과합성소수류신양본해결양본불족문제。산법기우양본여류중심거리,결합류취집정도제취충자양본。근거SMOTE(Synthetic Minority Over-sampling Technique)산법사상,재충자양본상실현소수류신양본합성。근거충자양본여소수류중심거리구조신양본분포함수。기우차채양산법병재다개수거집상진행분류실험,결과표명DB_SMOTE산법시가행적。
In order to solve the asymmetry of class distribution information in imbalanced data, DB_SMOTE(Distance-based Synthetic Minority Over-sampling Technique)algorithm is presented by minority new sample synthetic. According to the distance between sample and the centre of class, seed sample is gained by combining class aggregation. Based on SMOTE (Synthetic Minority Over-sampling Technique), new sample is synthesized. Based upon the distance between seed sample and the centre of minority class, new sample distribution function is formed. Classification experiment results show DB_SMOTE is feasible.