计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2015年
7期
869-876
,共8页
支持向量机%不均衡数据%过取样%欠取样%K-近邻
支持嚮量機%不均衡數據%過取樣%欠取樣%K-近鄰
지지향량궤%불균형수거%과취양%흠취양%K-근린
support vector machine%imbalanced dataset%over-sampling%under-sampling%K-nearest neighbor
*The Scientific Research Program Funded by Education Department of Shaanxi Province under Grant No.12JK0748(陕西省教育厅科技计划项目);the Science and Technology Research Project of Shangluo University under Grant No.13sky024(商洛学院科学与技术研究项目).<br> 法插入样本;最后在新的训练集上确定最终决策函数。在人工数据集和4组UCI数据集上进行了实验,结果表明了该算法对不均衡数据集进行降维采样的有效性。
*The Scientific Research Program Funded by Education Department of Shaanxi Province under Grant No.12JK0748(陝西省教育廳科技計劃項目);the Science and Technology Research Project of Shangluo University under Grant No.13sky024(商洛學院科學與技術研究項目).<br> 法插入樣本;最後在新的訓練集上確定最終決策函數。在人工數據集和4組UCI數據集上進行瞭實驗,結果錶明瞭該算法對不均衡數據集進行降維採樣的有效性。
*The Scientific Research Program Funded by Education Department of Shaanxi Province under Grant No.12JK0748(합서성교육청과기계화항목);the Science and Technology Research Project of Shangluo University under Grant No.13sky024(상락학원과학여기술연구항목).<br> 법삽입양본;최후재신적훈련집상학정최종결책함수。재인공수거집화4조UCI수거집상진행료실험,결과표명료해산법대불균형수거집진행강유채양적유효성。
In order to resolve the over fitting phenomenon of classifiers and enhance classification performance, this paper proposes an under-sampling method for imbalanced data classification based on K-nearest neighbor in kernel space. Firstly, this algorithm computes the k nearest neighbors of samples and contrary class samples in kernel space, and computes the average distance between two class samples. Then, this algorithm deletes the samples away from the classification boundary according to the control parameters, and uses the SMOTE over-sampling algorithm for small class samples to generate a new balanced sample set. Finally, this algorithm gets the final decision function with the new dataset. The algorithm may resolve the problem of imbalanced dataset and improve the classification performance of SVM. The experimental results with artificial dataset and four groups of UCI datasets show that the algorithm is effective for imbalanced dataset.