计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
7期
868-876
,共9页
支持向量机%主动学习%向量余弦%冗余度%平衡度
支持嚮量機%主動學習%嚮量餘絃%冗餘度%平衡度
지지향량궤%주동학습%향량여현%용여도%평형도
support vector machine%active learning%vector cosine%redundancy%balance
针对传统基于主动学习的支持向量机(support vector machine,SVM)方法中所采用的欧式距离不能有效衡量高维样本之间的相关程度,导致学习器泛化能力下降的问题,提出了一种基于向量余弦的支持向量机主动学习(SVM active learning based on vector cosine)策略,称为COS_SVMactive方法。该方法通过在主动学习过程中引入向量余弦来度量训练集中样本信息的冗余度,以挑选那些含有重要分类信息的最有价值样本交给专家进行人工标注,并在迭代的样本标注过程中对训练集的平衡度进行逐步调整,使学习器获得更好的泛化性能。实验结果表明,与传统基于随机采样的SVM主动学习方法(SVM active learning based on ran-dom sampling,RS_SVMactive)和基于距离的SVM主动学习方法(SVM active learning based on distance, DIS_SVMactive)相比,COS_SVMactive方法不仅可以提高分类精度,而且能够减少专家标记代价。
針對傳統基于主動學習的支持嚮量機(support vector machine,SVM)方法中所採用的歐式距離不能有效衡量高維樣本之間的相關程度,導緻學習器汎化能力下降的問題,提齣瞭一種基于嚮量餘絃的支持嚮量機主動學習(SVM active learning based on vector cosine)策略,稱為COS_SVMactive方法。該方法通過在主動學習過程中引入嚮量餘絃來度量訓練集中樣本信息的冗餘度,以挑選那些含有重要分類信息的最有價值樣本交給專傢進行人工標註,併在迭代的樣本標註過程中對訓練集的平衡度進行逐步調整,使學習器穫得更好的汎化性能。實驗結果錶明,與傳統基于隨機採樣的SVM主動學習方法(SVM active learning based on ran-dom sampling,RS_SVMactive)和基于距離的SVM主動學習方法(SVM active learning based on distance, DIS_SVMactive)相比,COS_SVMactive方法不僅可以提高分類精度,而且能夠減少專傢標記代價。
침대전통기우주동학습적지지향량궤(support vector machine,SVM)방법중소채용적구식거리불능유효형량고유양본지간적상관정도,도치학습기범화능력하강적문제,제출료일충기우향량여현적지지향량궤주동학습(SVM active learning based on vector cosine)책략,칭위COS_SVMactive방법。해방법통과재주동학습과정중인입향량여현래도량훈련집중양본신식적용여도,이도선나사함유중요분류신식적최유개치양본교급전가진행인공표주,병재질대적양본표주과정중대훈련집적평형도진행축보조정,사학습기획득경호적범화성능。실험결과표명,여전통기우수궤채양적SVM주동학습방법(SVM active learning based on ran-dom sampling,RS_SVMactive)화기우거리적SVM주동학습방법(SVM active learning based on distance, DIS_SVMactive)상비,COS_SVMactive방법불부가이제고분류정도,이차능구감소전가표기대개。
This paper proposes a support vector machine (SVM) active learning strategy based on vector cosine for the high dimensional dataset to solve the problem that the traditional support vector machine based on active learning can not measure the correlation degree of different samples by Euclidean distance and obtains the low generalization ability, namely COS_SVMactive method. By measuring the information redundancy of training samples based on vector cosine on active learning procedure, several the most valuable samples are selected and need be labeled by experts. In each samples labeling loop, the balance of labeled data is gradually adjusted in order to achieve good generalization performance. The experimental results demonstrate that, compared with common SVM active learning based on random sampling (RS_SVMactive) and SVM active learning based on distance (DIS_SVMactive) methods, the proposed COS_SVMactive method can not only improve classification accuracy, but also reduce the artificial labeling cost.