计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2013年
11期
57-60,64
,共5页
云计算%MapReduce模型%Hadoop架构%贝叶斯算法%垃圾邮件%反垃圾邮件过滤
雲計算%MapReduce模型%Hadoop架構%貝葉斯算法%垃圾郵件%反垃圾郵件過濾
운계산%MapReduce모형%Hadoop가구%패협사산법%랄급유건%반랄급유건과려
cloud computing%MapReduce model%Hadoop framework%Bayesian algorithm%spam mail%anti-spam mail filtering
传统分布式大型邮件系统对海量邮件的过滤存在编程难、效率低、前期训练耗用资源大等缺点,为此,对传统贝叶斯过滤算法进行并行化改进,利用云计算MapReduce模型在海量数据处理方面的优势,设计一种基于Hadoop开源云架构的贝叶斯邮件过滤MapReduce模型,优化邮件的训练和过滤过程。实验结果表明,与传统分布式计算模型相比,该模型在召回率、查准率和精确率方面性能较好,同时可降低邮件过滤成本,提高系统执行效率。
傳統分佈式大型郵件繫統對海量郵件的過濾存在編程難、效率低、前期訓練耗用資源大等缺點,為此,對傳統貝葉斯過濾算法進行併行化改進,利用雲計算MapReduce模型在海量數據處理方麵的優勢,設計一種基于Hadoop開源雲架構的貝葉斯郵件過濾MapReduce模型,優化郵件的訓練和過濾過程。實驗結果錶明,與傳統分佈式計算模型相比,該模型在召迴率、查準率和精確率方麵性能較好,同時可降低郵件過濾成本,提高繫統執行效率。
전통분포식대형유건계통대해량유건적과려존재편정난、효솔저、전기훈련모용자원대등결점,위차,대전통패협사과려산법진행병행화개진,이용운계산MapReduce모형재해량수거처리방면적우세,설계일충기우Hadoop개원운가구적패협사유건과려MapReduce모형,우화유건적훈련화과려과정。실험결과표명,여전통분포식계산모형상비,해모형재소회솔、사준솔화정학솔방면성능교호,동시가강저유건과려성본,제고계통집행효솔。
There are some disadvantages of mass mail filtering for large mail systems on the traditional distributed system including programming difficulties, low efficiency, mass system and network resources consumed. Taking advantage of the high performance of the cloud computing in processing data processing effectively, a MapReduce model of Bayesian mail filtering based on Hadoop is proposed. It improves the traditional Bayesian filtering algorithms and optimizes the mail training and filtering processes. Experimental results show that, compared with traditional distributed computing model, the Hadoop-based MapReduce model of Bayesian anti-spam mail filtering performs better in recall, precision and accuracy, reduces the cost of mail learning and classifying and improves the system efficiency.