计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2014年
9期
248-251,256
,共5页
样本预处理%均衡抽样%权重调整%泛化性能%类中心最小距离%样本区分度
樣本預處理%均衡抽樣%權重調整%汎化性能%類中心最小距離%樣本區分度
양본예처리%균형추양%권중조정%범화성능%류중심최소거리%양본구분도
sample preprocessing%even sampling%weight adjustment%generalization performance%minimum distance of class center%different degree of sample
根据分类算法是依据样本区分度进行分类的原理,提出增加样本属性以提高样本区分度的方法,在样本预处理阶段对所有样本增加一个属性值dmin以加强样本之间的区分度。针对原始Adaboost算法在抽样阶段由于抽样不均而导致对某些类训练不足的问题,采用均衡抽样方法,保证在抽样阶段所抽取的不同类样本的数量比例不变。针对原始算法样本权重增长过快的问题,给出新的权重调整策略,引入样本错分计数量count( n),有效地抑制样本权重增长速度。给出一种改进的Adaboost算法,即SWA-Adaboost算法,并采用美国加州大学机器学习UCI数据库中6种数据集的数据对改进算法与原始算法进行实验对比,结果证明,改进算法SWA-Adaboost在泛化性能上优于Adaboost算法,泛化误差平均降低9.54%。
根據分類算法是依據樣本區分度進行分類的原理,提齣增加樣本屬性以提高樣本區分度的方法,在樣本預處理階段對所有樣本增加一箇屬性值dmin以加彊樣本之間的區分度。針對原始Adaboost算法在抽樣階段由于抽樣不均而導緻對某些類訓練不足的問題,採用均衡抽樣方法,保證在抽樣階段所抽取的不同類樣本的數量比例不變。針對原始算法樣本權重增長過快的問題,給齣新的權重調整策略,引入樣本錯分計數量count( n),有效地抑製樣本權重增長速度。給齣一種改進的Adaboost算法,即SWA-Adaboost算法,併採用美國加州大學機器學習UCI數據庫中6種數據集的數據對改進算法與原始算法進行實驗對比,結果證明,改進算法SWA-Adaboost在汎化性能上優于Adaboost算法,汎化誤差平均降低9.54%。
근거분류산법시의거양본구분도진행분류적원리,제출증가양본속성이제고양본구분도적방법,재양본예처리계단대소유양본증가일개속성치dmin이가강양본지간적구분도。침대원시Adaboost산법재추양계단유우추양불균이도치대모사류훈련불족적문제,채용균형추양방법,보증재추양계단소추취적불동류양본적수량비례불변。침대원시산법양본권중증장과쾌적문제,급출신적권중조정책략,인입양본착분계수량count( n),유효지억제양본권중증장속도。급출일충개진적Adaboost산법,즉SWA-Adaboost산법,병채용미국가주대학궤기학습UCI수거고중6충수거집적수거대개진산법여원시산법진행실험대비,결과증명,개진산법SWA-Adaboost재범화성능상우우Adaboost산법,범화오차평균강저9.54%。
Because the classification algorithm based on the differences among samples, a new method is proposed which adds a new property value dmin into each sample in order to increase the differences. Besides, according to the situation that samples belonging to different classes are sampled unevenly in the sampling phase, a new method called even sampling is proposed to keep the proportion of difference classes invariant. For the purpose of inhibition of the increment speed of misclassification samples,a new method is proposed which brings in a variable count( n) to record the times of misclassification. In the word,an improved algorithm called Sampling equilibrium &Weight adjustment &Add attribute Adaboost ( SWA-Adaboost ) is proposed. Using the 6 datasets belonging to machine learning database of University of California in USA, the paper runs experiments to compare the original Adaboost with SWA-Adaboost. Experimental results show that SWA-Adaboost has better generalization performance than the original Adaboost and the average decrease of generalization error is 9. 54%.