计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2014年
12期
4329-4334
,共6页
正例与未标注学习%决策树%随机森林%集成学习%偏置支持向量机
正例與未標註學習%決策樹%隨機森林%集成學習%偏置支持嚮量機
정례여미표주학습%결책수%수궤삼림%집성학습%편치지지향량궤
positive and unlabeled learning%decision tree%random forest%ensemble learning%biased support vector machine
为使用正例与未标注数据训练分类器(positive and unlabeled learning , PU learning),提出基于随机森林的PU学习算法。对POSC4.5算法进行扩展,在其生成决策树的过程中加入随机特征选择;在训练阶段,使用有放回抽样技术对PU数据集抽样,生成多个不同的PU训练集,并以其训练扩展后的 POSC4.5算法,构造多棵决策树;在分类阶段,采用多数投票策略集成各决策树输出。在 UCI数据集上的实验结果表明,该算法的分类性能优于偏置支持向量机算法、POS4.5算法和基于装袋技术的POSC4.5算法。
為使用正例與未標註數據訓練分類器(positive and unlabeled learning , PU learning),提齣基于隨機森林的PU學習算法。對POSC4.5算法進行擴展,在其生成決策樹的過程中加入隨機特徵選擇;在訓練階段,使用有放迴抽樣技術對PU數據集抽樣,生成多箇不同的PU訓練集,併以其訓練擴展後的 POSC4.5算法,構造多棵決策樹;在分類階段,採用多數投票策略集成各決策樹輸齣。在 UCI數據集上的實驗結果錶明,該算法的分類性能優于偏置支持嚮量機算法、POS4.5算法和基于裝袋技術的POSC4.5算法。
위사용정례여미표주수거훈련분류기(positive and unlabeled learning , PU learning),제출기우수궤삼림적PU학습산법。대POSC4.5산법진행확전,재기생성결책수적과정중가입수궤특정선택;재훈련계단,사용유방회추양기술대PU수거집추양,생성다개불동적PU훈련집,병이기훈련확전후적 POSC4.5산법,구조다과결책수;재분류계단,채용다수투표책략집성각결책수수출。재 UCI수거집상적실험결과표명,해산법적분류성능우우편치지지향량궤산법、POS4.5산법화기우장대기술적POSC4.5산법。
To use positive and unlabeled examples to train the classifier (PU learning) ,an algorithm based on the random forest was proposed .The PU decision tree algorithm POSC4.5 was extended to deal with the random feature selection when a tree was growing .In the training phrase ,sampling with replacement on the original PU dataset was adopted to generate multiple different PU training datasets and multiple trees were trained on these datasets using extended POSC4.5 .In the classification phrase ,the outputs of the trained trees were aggregated using the majority vote .Experimental results on UCI data sets show that the classi‐fication performance of the method proposed is better than that of the biased support vector machine ,the POSC4.5 and the bag‐ging POSC4.5 .