计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2014年
5期
183-187,191
,共6页
特征变换%已标记实例集%差异%自助抽样%泛化能力
特徵變換%已標記實例集%差異%自助抽樣%汎化能力
특정변환%이표기실례집%차이%자조추양%범화능력
feature transformation%labeled instances set%difference%bootstrap sampling%generalization ability
提出一种基于特征变换的Tri-Training算法。通过特征变换将已标记实例集映射到新空间,得到有差异的训练集,从而构建准确又存在差异的基分类器,避免自助采样不能充分利用全部已标记实例集的问题。为充分利用数据类分布信息,设计基于Must-link和Cannot-link约束集合的特征变换方法(TMC),并将其用于基于特征变换的Tri-Training算法中。在UCI数据集上的实验结果表明,在不同未标记率下,与经典的Co-Training、Tri-Trainng算法相比,基于特征变换的Tri-Training算法可在多数数据集上得到更高的准确率。此外,与Tri-LDA和Tri-CP算法相比,基于TMC的Tri-Training算法具有更好的泛化性能。
提齣一種基于特徵變換的Tri-Training算法。通過特徵變換將已標記實例集映射到新空間,得到有差異的訓練集,從而構建準確又存在差異的基分類器,避免自助採樣不能充分利用全部已標記實例集的問題。為充分利用數據類分佈信息,設計基于Must-link和Cannot-link約束集閤的特徵變換方法(TMC),併將其用于基于特徵變換的Tri-Training算法中。在UCI數據集上的實驗結果錶明,在不同未標記率下,與經典的Co-Training、Tri-Trainng算法相比,基于特徵變換的Tri-Training算法可在多數數據集上得到更高的準確率。此外,與Tri-LDA和Tri-CP算法相比,基于TMC的Tri-Training算法具有更好的汎化性能。
제출일충기우특정변환적Tri-Training산법。통과특정변환장이표기실례집영사도신공간,득도유차이적훈련집,종이구건준학우존재차이적기분류기,피면자조채양불능충분이용전부이표기실례집적문제。위충분이용수거류분포신식,설계기우Must-link화Cannot-link약속집합적특정변환방법(TMC),병장기용우기우특정변환적Tri-Training산법중。재UCI수거집상적실험결과표명,재불동미표기솔하,여경전적Co-Training、Tri-Trainng산법상비,기우특정변환적Tri-Training산법가재다수수거집상득도경고적준학솔。차외,여Tri-LDA화Tri-CP산법상비,기우TMC적Tri-Training산법구유경호적범화성능。
This paper proposes a new Tri-Training algorithm based on feature transformation. It employs feature transformation to transform labeled instances into new space to obtain new training sets, and constructs accurate and diverse classifiers. In this way, it avoids the weakness of bootstrap sampling which only adopts training data samples to train base classifiers. In order to make full use of the data distribution information, this paper introduces a new transformation method called Transformation Based on Must-link Constrains and Cannot-link Constrains(TMC), and uses it to this new Tri-Training algorithm. Experimental results on UCI data sets show that, in different unlabeled rate, compared with the classic Co-Training and Tri-Training algorithm, the proposed algorithm based on feature transformation gets the highest accuracy in most data sets. In addition, compared with the Tri-LDA and Tri-CP algorithm, the Tri-Training algorithm based on TMC has better generalization ability.