计算机工程与应用
計算機工程與應用
계산궤공정여응용
Computer Engineering and Applications
2015年
21期
133-137
,共5页
不平衡数据集%权重分配模型%支持向量机(SVM)
不平衡數據集%權重分配模型%支持嚮量機(SVM)
불평형수거집%권중분배모형%지지향량궤(SVM)
imbalanced dataset%weight assignment model%Support Vector Machine(SVM)
SVM在处理不平衡数据分类问题(class imbalance problem)时,其分类结果常倾向于多数类。为此,综合考虑类间不平衡和类内不平衡,提出一种基于聚类权重的分阶段支持向量机(WSVM)。预处理时,采用K均值算法得到多数类中各样本的权重。分类时,第一阶段根据权重选出多数类内各簇边界区域的与少数类数目相等的样本;第二阶段对选取的样本和少数类样本进行初始分类;第三阶段用多数类中未选取的样本对初始分类器进行优化调整,当满足停止条件时,得到最终分类器。通过对UCI数据集的大量实验表明,WSVM在少数类样本的识别率和分类器的整体性能上都优于传统分类算法。
SVM在處理不平衡數據分類問題(class imbalance problem)時,其分類結果常傾嚮于多數類。為此,綜閤攷慮類間不平衡和類內不平衡,提齣一種基于聚類權重的分階段支持嚮量機(WSVM)。預處理時,採用K均值算法得到多數類中各樣本的權重。分類時,第一階段根據權重選齣多數類內各簇邊界區域的與少數類數目相等的樣本;第二階段對選取的樣本和少數類樣本進行初始分類;第三階段用多數類中未選取的樣本對初始分類器進行優化調整,噹滿足停止條件時,得到最終分類器。通過對UCI數據集的大量實驗錶明,WSVM在少數類樣本的識彆率和分類器的整體性能上都優于傳統分類算法。
SVM재처리불평형수거분류문제(class imbalance problem)시,기분류결과상경향우다수류。위차,종합고필류간불평형화류내불평형,제출일충기우취류권중적분계단지지향량궤(WSVM)。예처리시,채용K균치산법득도다수류중각양본적권중。분류시,제일계단근거권중선출다수류내각족변계구역적여소수류수목상등적양본;제이계단대선취적양본화소수류양본진행초시분류;제삼계단용다수류중미선취적양본대초시분류기진행우화조정,당만족정지조건시,득도최종분류기。통과대UCI수거집적대량실험표명,WSVM재소수류양본적식별솔화분류기적정체성능상도우우전통분류산법。
Based on analyzing the shortages of SVM(Support Vector Machine)algorithm in solving classification problems on imbalanced dataset, a novel SVM approach based on cluster-weight technology and based-grading SVM classifier(short as WSVM)is presented in this paper that considers the uneven distribution of training sample between classes and within classes. The specific steps are as follows:when preprocessing, it uses K-means algorithm based on weight assignment model to obtain the weights of the majority samples. Classification is consisted of three phases. It selects the located in each cluster boundary majority samples, which is equal with the minority samples in quantity, then classifies the minority samples and selects samples, and adjusts the initial classifier through the unselected majority samples. When it comes to satisfy the explicit stopping criteria, the final classifier is got. A large amount of experiments by the UCI dataset show that WSVM can significantly improve the identification rate of the minority samples and overall classification performance.