内蒙古工业大学学报(自然科学版)
內矇古工業大學學報(自然科學版)
내몽고공업대학학보(자연과학판)
JOURNAL OF INNER MONGOLIA UNIVERSITY OF TECHNOLOGY(NATURAL SCIENCE EDITION)
2013年
3期
209-213
,共5页
邮件向量%垃圾邮件过滤%特征选择%Naive Bayes算法%F1 值
郵件嚮量%垃圾郵件過濾%特徵選擇%Naive Bayes算法%F1 值
유건향량%랄급유건과려%특정선택%Naive Bayes산법%F1 치
mail vector%spam filtering%feature selection%Naive Bayes algorithm%F1 value
在中文垃圾邮件过滤系统中,基于内容过滤的Na?ve Bayes算法得到了广泛应用。本文将多种特征结合构建邮件文本向量,应用八种文本分类特征选择方法在Na?ve Bayes算法上进行实验验证,通过准确率和召回率结合的综合性能指标F1值进行性能评价,结果表明,采用类别区分词、优势率、信息增益、期望交叉熵、CHI统计和文本证据权等六种特征选择方法应用于多特征结合邮件文本向量的过滤取得了较好的垃圾邮件过滤性能,反垃圾邮件效果较好。
在中文垃圾郵件過濾繫統中,基于內容過濾的Na?ve Bayes算法得到瞭廣汎應用。本文將多種特徵結閤構建郵件文本嚮量,應用八種文本分類特徵選擇方法在Na?ve Bayes算法上進行實驗驗證,通過準確率和召迴率結閤的綜閤性能指標F1值進行性能評價,結果錶明,採用類彆區分詞、優勢率、信息增益、期望交扠熵、CHI統計和文本證據權等六種特徵選擇方法應用于多特徵結閤郵件文本嚮量的過濾取得瞭較好的垃圾郵件過濾性能,反垃圾郵件效果較好。
재중문랄급유건과려계통중,기우내용과려적Na?ve Bayes산법득도료엄범응용。본문장다충특정결합구건유건문본향량,응용팔충문본분류특정선택방법재Na?ve Bayes산법상진행실험험증,통과준학솔화소회솔결합적종합성능지표F1치진행성능평개,결과표명,채용유별구분사、우세솔、신식증익、기망교차적、CHI통계화문본증거권등륙충특정선택방법응용우다특정결합유건문본향량적과려취득료교호적랄급유건과려성능,반랄급유건효과교호。
In Chinese spam filtering system , Naive Bayes algorithm based on content filtering has been widely used .The combination of characteristics to build the e -mail text vector by eight text classification feature selection method is applied to Naive Bayes algorithm for experimental verification .The performance evaluation results show that six feature selection methods , category distinguish words , odds ratio, information gain, expected cross entropy , CHI statistical and textual evidence weight , is made good spam filtering performance to the multi -feature combined with the text vector filtering by the comprehensive performance indicator F1 value of precision and recall rate , and anti-spam effect is good .