电子学报
電子學報
전자학보
Acta Electronica Sinica
2015年
10期
1963-1970
,共8页
王友卫%刘元宁%凤丽洲%朱晓冬
王友衛%劉元寧%鳳麗洲%硃曉鼕
왕우위%류원저%봉려주%주효동
垃圾邮件%用户兴趣集%支持向量机%主动学习%在线应用
垃圾郵件%用戶興趣集%支持嚮量機%主動學習%在線應用
랄급유건%용호흥취집%지지향량궤%주동학습%재선응용
spam%user interest set%support vector machine%active learning%online application
为在不显著降低垃圾邮件识别精度的同时有效提高邮件识别速度,提出了一种在线垃圾邮件快速识别新方法。首先引入用户正、负兴趣集的概念,结合用户兴趣集及支持向量机对邮件进行分类;然后根据主动学习理论,结合训练集样本密度及改进角度差异方法寻找分类最不确定的样本并推荐给用户进行类别标注;最后将标注后样本及分类最确定性样本加入训练集,并使用样本价值评价新函数淘汰冗余样本以生成新的训练集。实验表明,本文方法的用户标注负担小,垃圾邮件识别精度高、速度快,具有较高的在线应用价值。
為在不顯著降低垃圾郵件識彆精度的同時有效提高郵件識彆速度,提齣瞭一種在線垃圾郵件快速識彆新方法。首先引入用戶正、負興趣集的概唸,結閤用戶興趣集及支持嚮量機對郵件進行分類;然後根據主動學習理論,結閤訓練集樣本密度及改進角度差異方法尋找分類最不確定的樣本併推薦給用戶進行類彆標註;最後將標註後樣本及分類最確定性樣本加入訓練集,併使用樣本價值評價新函數淘汰冗餘樣本以生成新的訓練集。實驗錶明,本文方法的用戶標註負擔小,垃圾郵件識彆精度高、速度快,具有較高的在線應用價值。
위재불현저강저랄급유건식별정도적동시유효제고유건식별속도,제출료일충재선랄급유건쾌속식별신방법。수선인입용호정、부흥취집적개념,결합용호흥취집급지지향량궤대유건진행분류;연후근거주동학습이론,결합훈련집양본밀도급개진각도차이방법심조분류최불학정적양본병추천급용호진행유별표주;최후장표주후양본급분류최학정성양본가입훈련집,병사용양본개치평개신함수도태용여양본이생성신적훈련집。실험표명,본문방법적용호표주부담소,랄급유건식별정도고、속도쾌,구유교고적재선응용개치。
In order to improve the spam identification speed without sacrificing the accuracy seriously,a novel quick online spam identication method is proposed.Firstly,the conceptions of user positive interest set and user negative interest set are intro-duced,and emails are classified by combining user interest sets and support vector machine.Secondly,based on the active learning theory,the sample densities of different categories and the improved angle diversity method are used to select the most uncertainly classified samples,and the selected samples are recommended to users for labeling.Finally,the labeled and the classified samples with greatest possiblities are put into the training set,and a novel sample value evaluating function is proposed to filter the redundant samples for generating a new training set.Experimental results show that,the sample labeling burden of the proposed method is small,the spam identification accuracy is high,and the spam identification speed is fast,the high value of the proposed method on online application is proved.