交通科学与工程
交通科學與工程
교통과학여공정
JOURNAL OF CHANGSHA COMMUNICATIONS UNIVERSRTY
2014年
4期
77-82
,共6页
拥挤识别%不平衡分类%重采样方法%交叉组合%分类器
擁擠識彆%不平衡分類%重採樣方法%交扠組閤%分類器
옹제식별%불평형분류%중채양방법%교차조합%분류기
crowded identification%unbalanced classification%resample method%cross combinations%classifier
针对拥挤数据分布不平衡问题,提出了一种新的重采样方法———交叉组合重采样法。该方法是将随机向下采样法与 smote法相结合,对原始数据进行交叉采样,以减少采样法对原始数据的非均匀性破坏。通过仿真,得到比例为1∶10.1的非拥挤数据和拥挤数据原始样本。根据实际情况,通过交叉采样法,分别得到类比例为1∶5,1∶3以及1∶1的数据集,并对3种情况下的分类结果进行对比分析。选择朴素贝叶斯分类器、贝叶斯网络分类器及神经网络分类器,在不同比例数据集下,针对交叉组合重采样法和一般组合重采样法进行对比实验。实验结果证明:交叉组合重采样法能够更好地解决拥挤数据不平衡给分类器带来的问题。
針對擁擠數據分佈不平衡問題,提齣瞭一種新的重採樣方法———交扠組閤重採樣法。該方法是將隨機嚮下採樣法與 smote法相結閤,對原始數據進行交扠採樣,以減少採樣法對原始數據的非均勻性破壞。通過倣真,得到比例為1∶10.1的非擁擠數據和擁擠數據原始樣本。根據實際情況,通過交扠採樣法,分彆得到類比例為1∶5,1∶3以及1∶1的數據集,併對3種情況下的分類結果進行對比分析。選擇樸素貝葉斯分類器、貝葉斯網絡分類器及神經網絡分類器,在不同比例數據集下,針對交扠組閤重採樣法和一般組閤重採樣法進行對比實驗。實驗結果證明:交扠組閤重採樣法能夠更好地解決擁擠數據不平衡給分類器帶來的問題。
침대옹제수거분포불평형문제,제출료일충신적중채양방법———교차조합중채양법。해방법시장수궤향하채양법여 smote법상결합,대원시수거진행교차채양,이감소채양법대원시수거적비균균성파배。통과방진,득도비례위1∶10.1적비옹제수거화옹제수거원시양본。근거실제정황,통과교차채양법,분별득도류비례위1∶5,1∶3이급1∶1적수거집,병대3충정황하적분류결과진행대비분석。선택박소패협사분류기、패협사망락분류기급신경망락분류기,재불동비례수거집하,침대교차조합중채양법화일반조합중채양법진행대비실험。실험결과증명:교차조합중채양법능구경호지해결옹제수거불평형급분류기대래적문제。
A new re-sampling method is paccording to the problems of crowded data dis-tribution imbalance-cross combinations resample method,which combines random sam-pling method downwards and smote method.The cross-sampling method is taken to deal with the original data and the damage of the original data caused by sampling meth-od is reduced in homogeneity.Non-crowding and congestion data sample data with the ratio of approximately 1∶10.1 is obtained through simulation.According to the actual situation,the data with the ratio of 1∶5 ,1∶3 and 1∶1 could be received with the meth-od of cross combinations resample,and the classification results are compared and ana-lyzed in these three cases.Finally,cross combinations resample method and common combinations resample method are compared in the case of different ratios with the naive Bayes classifier,and bayesian network classifiers and neural network classifiers are done.Through experimental verification,it is proved that the cross combinations resam-ple method could better solve the congestion data imbalance problem which brings to the classifier.