华南理工大学学报(自然科学版)
華南理工大學學報(自然科學版)
화남리공대학학보(자연과학판)
JOURNAL OF SOUTH CHINA UNIVERSITY OF TECHNOLOGY(NATURAL SCIENCE EDITION)
2014年
7期
21-27
,共7页
王友卫%刘元宁%凤丽洲%朱晓冬
王友衛%劉元寧%鳳麗洲%硃曉鼕
왕우위%류원저%봉려주%주효동
垃圾邮件%支持向量机%增量学习%主动学习%用户兴趣
垃圾郵件%支持嚮量機%增量學習%主動學習%用戶興趣
랄급유건%지지향량궤%증량학습%주동학습%용호흥취
spam%support vector machines%incremental learning%active learning%user interest
多数在线垃圾邮件识别方法未有效区分用户针对不同邮件内容的感兴趣程度,导致垃圾邮件识别精度不高。文中提出了一种基于支持向量机的垃圾邮件在线识别新方法。即结合传统增量学习及主动学习理论,先通过随机选择代表样本寻找分类最不确定的样本进行人工标注;接着引入用户兴趣度的概念,提出了新的样本标注模型和算法性能评价标准;最后结合“轮盘赌”方法将标注后样本加入训练样本集。多种对比实验表明,文中方法针对垃圾邮件识别精度高,样本训练及待标注样本选择速度快,具有较高的在线应用价值。
多數在線垃圾郵件識彆方法未有效區分用戶針對不同郵件內容的感興趣程度,導緻垃圾郵件識彆精度不高。文中提齣瞭一種基于支持嚮量機的垃圾郵件在線識彆新方法。即結閤傳統增量學習及主動學習理論,先通過隨機選擇代錶樣本尋找分類最不確定的樣本進行人工標註;接著引入用戶興趣度的概唸,提齣瞭新的樣本標註模型和算法性能評價標準;最後結閤“輪盤賭”方法將標註後樣本加入訓練樣本集。多種對比實驗錶明,文中方法針對垃圾郵件識彆精度高,樣本訓練及待標註樣本選擇速度快,具有較高的在線應用價值。
다수재선랄급유건식별방법미유효구분용호침대불동유건내용적감흥취정도,도치랄급유건식별정도불고。문중제출료일충기우지지향량궤적랄급유건재선식별신방법。즉결합전통증량학습급주동학습이론,선통과수궤선택대표양본심조분류최불학정적양본진행인공표주;접착인입용호흥취도적개념,제출료신적양본표주모형화산법성능평개표준;최후결합“륜반도”방법장표주후양본가입훈련양본집。다충대비실험표명,문중방법침대랄급유건식별정도고,양본훈련급대표주양본선택속도쾌,구유교고적재선응용개치。
Most online spam identification methods cannot effectively distinguish user interest degree in contents of different emails, thus causing identification precision to be very low .In this paper , a novel online spam identifica-tion method based on the support vector machine (SVM) is proposed.First, according to the theories of incremen-tal learning and active learning , the representative samples are randomly selected from training sets so as to find out samples with most uncertain classification for users to implement labeling .Then , the concept of the user interest degree is introduced , and a new sample labeling model and a new algorithm performance evaluation criterion are proposed .Finally, the“roulette” method is employed to add the labeled samples to the training sets .The results of various comparative experiments show that the proposed method effectively helps achieve high spam identification precision and high speeds of training samples and selecting the samples to be labeled , so its online application is highly valuable .