中国科学技术大学学报
中國科學技術大學學報
중국과학기술대학학보
JOURNAL OF UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA
2010年
2期
146-151
,共6页
程有龙%庄连生%李斌%庄镇泉
程有龍%莊連生%李斌%莊鎮泉
정유룡%장련생%리빈%장진천
Boosting%小样本集%非平衡训练集%特征置换%扰动%泛化能力
Boosting%小樣本集%非平衡訓練集%特徵置換%擾動%汎化能力
Boosting%소양본집%비평형훈련집%특정치환%우동%범화능력
Boosting%small sample set%imbalanced training set%feature knock out%disturbing%generalization
传统的Boosting算法训练出的分类器常会出现过拟合和向多数类偏移.为此,提出一种基于自适应样本注入和特征置换的Boosting学习算法,通过在训练过程中加入人工合成样本,逐渐平衡训练集,并通过合成的样本对分类器学习进行扰动,使分类器选择更多有效的特征,提高了分类器的泛化能力.最后,在两类和多类图片分类问题上对该算法的有效性进行了考察,实验结果表明,该算法能够在样本数很少,且正负样本数量极不均衡的情况下,有效提高booting算法的泛化能力.
傳統的Boosting算法訓練齣的分類器常會齣現過擬閤和嚮多數類偏移.為此,提齣一種基于自適應樣本註入和特徵置換的Boosting學習算法,通過在訓練過程中加入人工閤成樣本,逐漸平衡訓練集,併通過閤成的樣本對分類器學習進行擾動,使分類器選擇更多有效的特徵,提高瞭分類器的汎化能力.最後,在兩類和多類圖片分類問題上對該算法的有效性進行瞭攷察,實驗結果錶明,該算法能夠在樣本數很少,且正負樣本數量極不均衡的情況下,有效提高booting算法的汎化能力.
전통적Boosting산법훈련출적분류기상회출현과의합화향다수류편이.위차,제출일충기우자괄응양본주입화특정치환적Boosting학습산법,통과재훈련과정중가입인공합성양본,축점평형훈련집,병통과합성적양본대분류기학습진행우동,사분류기선택경다유효적특정,제고료분류기적범화능력.최후,재량류화다류도편분류문제상대해산법적유효성진행료고찰,실험결과표명,해산법능구재양본수흔소,차정부양본수량겁불균형적정황하,유효제고booting산법적범화능력.
Traditional Boosting algorithms tend to overfit and be biased towards the majority class on small and imbalanced training sets. To address this issue, an improved Boosting learning algorithm with adaptive sample injecting and feature knock out was proposed. In the training process, synthetic samples were appended to the original training set to rebalance it and disturb and enhance its generalization ability. The method was tested on both two-class and multi-class image classification problems. Experiment results show that when the number of training samples is small, and the distribution of training set is imbalanced, the proposed method can enhance the generalization performance of Boosting algorithms effectively.