电脑知识与技术
電腦知識與技術
전뇌지식여기술
COMPUTER KNOWLEDGE AND TECHNOLOGY
2014年
34期
8119-8121
,共3页
Hadoop%MapReduce%大规模%垃圾邮件%分类
Hadoop%MapReduce%大規模%垃圾郵件%分類
Hadoop%MapReduce%대규모%랄급유건%분류
hadoop%mapreduce%massive%spam%classification
为了从大量的电子邮件中检测垃圾邮件,提出了一个基于Hadoop平台的电子邮件分类方法。不同于传统的基于内容的垃圾邮件检测,通过在MapReduce框架上统计分析邮件收发记录,提取邮件账号的行为特征。然后使用MapReduce框架并行的实现随机森林分类器,并基于带有行为特征的样本训练分类器和分类邮件。实验结果表明,基于Hadoop平台的电子邮件分类方法大大提高了大规模电子邮件的分类效率。
為瞭從大量的電子郵件中檢測垃圾郵件,提齣瞭一箇基于Hadoop平檯的電子郵件分類方法。不同于傳統的基于內容的垃圾郵件檢測,通過在MapReduce框架上統計分析郵件收髮記錄,提取郵件賬號的行為特徵。然後使用MapReduce框架併行的實現隨機森林分類器,併基于帶有行為特徵的樣本訓練分類器和分類郵件。實驗結果錶明,基于Hadoop平檯的電子郵件分類方法大大提高瞭大規模電子郵件的分類效率。
위료종대량적전자유건중검측랄급유건,제출료일개기우Hadoop평태적전자유건분류방법。불동우전통적기우내용적랄급유건검측,통과재MapReduce광가상통계분석유건수발기록,제취유건장호적행위특정。연후사용MapReduce광가병행적실현수궤삼림분류기,병기우대유행위특정적양본훈련분류기화분류유건。실험결과표명,기우Hadoop평태적전자유건분류방법대대제고료대규모전자유건적분류효솔。
To detect spams from the massive emails, an email classification method based on Hadoop platform is proposed. Differ?ent from the traditional context-based spam detection, the proposed method statistically analyze the email records by MapRe?duce framework to extract behavioral features of each email account. Then Random Forests classifier is implemented in parallel by MapReduce framework. Based on the samples with extracted behavioral features, Random Forests classifier is trained and uti?lized to classify emails. Experimental results show that, the Hadoop based email classification method largely increases the efficien?cy of massive email classification.