电子与信息学报
電子與信息學報
전자여신식학보
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY
2015年
7期
1726-1732
,共7页
数据挖掘%监督学习%非平衡数据集分类%影响函数%k-近邻
數據挖掘%鑑督學習%非平衡數據集分類%影響函數%k-近鄰
수거알굴%감독학습%비평형수거집분류%영향함수%k-근린
Data mining%Supervised learning%Classification of imbalanced data sets%Influence function%k-Nearest Neighbor (kNN)
分类是一种监督学习方法,通过在训练数据集学习模型判定未知样本的类标号。与传统的分类思想不同,该文从影响函数的角度理解分类,即从训练样本集对未知样本的影响来判定未知样本的类标号。首先介绍基于影响函数分类的思想;其次给出影响函数的定义,设计3种影响函数;最后基于这3种影响函数,提出基于影响函数的k-近邻(kNN)分类方法。并将该方法应用到非平衡数据集分类中。在18个UCI数据集上的实验结果表明,基于影响函数的k-近邻分类方法的分类性能好于传统的k-近邻分类方法,且对非平衡数据集分类有效。
分類是一種鑑督學習方法,通過在訓練數據集學習模型判定未知樣本的類標號。與傳統的分類思想不同,該文從影響函數的角度理解分類,即從訓練樣本集對未知樣本的影響來判定未知樣本的類標號。首先介紹基于影響函數分類的思想;其次給齣影響函數的定義,設計3種影響函數;最後基于這3種影響函數,提齣基于影響函數的k-近鄰(kNN)分類方法。併將該方法應用到非平衡數據集分類中。在18箇UCI數據集上的實驗結果錶明,基于影響函數的k-近鄰分類方法的分類性能好于傳統的k-近鄰分類方法,且對非平衡數據集分類有效。
분류시일충감독학습방법,통과재훈련수거집학습모형판정미지양본적류표호。여전통적분류사상불동,해문종영향함수적각도리해분류,즉종훈련양본집대미지양본적영향래판정미지양본적류표호。수선개소기우영향함수분류적사상;기차급출영향함수적정의,설계3충영향함수;최후기우저3충영향함수,제출기우영향함수적k-근린(kNN)분류방법。병장해방법응용도비평형수거집분류중。재18개UCI수거집상적실험결과표명,기우영향함수적k-근린분류방법적분류성능호우전통적k-근린분류방법,차대비평형수거집분류유효。
Classification is a supervised learning. It determines the class label of an unlabeled instance by learning model based on the training dataset. Unlike traditional classification, this paper views classification problem from another perspective, that is influential function. That is, the class label of an unlabeled instance is determined by the influence of the training data set. Firstly, the idea of classification is introduced based on influence function. Secondly, the definition of influence function is given and three influence functions are designed. Finally, this paper proposes k-nearest neighbor classification method based on these three influence functions and applies it to the classification of imbalanced data sets. The experimental results on 18 UCI data sets show that the proposed method improves effectively the k-nearest neighbor generalization ability. Besides, the proposed method is effective for imbalanced classification.