计算机技术与发展
計算機技術與髮展
계산궤기술여발전
COMPUTER TECHNOLOGY AND DEVELOPMENT
2013年
3期
23-26
,共4页
张依杨%向阳%蒋锐权%张波%张君瑛
張依楊%嚮暘%蔣銳權%張波%張君瑛
장의양%향양%장예권%장파%장군영
朴素贝叶斯分类算法%并行计算%MapReduce
樸素貝葉斯分類算法%併行計算%MapReduce
박소패협사분류산법%병행계산%MapReduce
Na?ve Bayes algorithm%parallel computing%MapReduce
朴素贝叶斯方法是一种高效的分类算法,但在处理海量数据时由于内存和I/O等资源的局限,该算法的效率受到极大影响.文中针对朴素贝叶斯分类算法特点,给出了基于MapReduce编程模型的实现朴素贝叶斯分类算法的方法.训练集内文件被分割进行处理,核心处理过程由MapReduce完成,Map函数完成对训练文件的解析,Reduce函数完成类别属性和特征属性知识库的构建.实验主要比较了传统算法和改进并行算法的性能,结果表明:在大数据量的情况下使用Ma-pReduce并行化的朴素贝叶斯算法具有良好的执行效率与较高的扩展性.
樸素貝葉斯方法是一種高效的分類算法,但在處理海量數據時由于內存和I/O等資源的跼限,該算法的效率受到極大影響.文中針對樸素貝葉斯分類算法特點,給齣瞭基于MapReduce編程模型的實現樸素貝葉斯分類算法的方法.訓練集內文件被分割進行處理,覈心處理過程由MapReduce完成,Map函數完成對訓練文件的解析,Reduce函數完成類彆屬性和特徵屬性知識庫的構建.實驗主要比較瞭傳統算法和改進併行算法的性能,結果錶明:在大數據量的情況下使用Ma-pReduce併行化的樸素貝葉斯算法具有良好的執行效率與較高的擴展性.
박소패협사방법시일충고효적분류산법,단재처리해량수거시유우내존화I/O등자원적국한,해산법적효솔수도겁대영향.문중침대박소패협사분류산법특점,급출료기우MapReduce편정모형적실현박소패협사분류산법적방법.훈련집내문건피분할진행처리,핵심처리과정유MapReduce완성,Map함수완성대훈련문건적해석,Reduce함수완성유별속성화특정속성지식고적구건.실험주요비교료전통산법화개진병행산법적성능,결과표명:재대수거량적정황하사용Ma-pReduce병행화적박소패협사산법구유량호적집행효솔여교고적확전성.
Na?ve Bayes is an efficient algorithm. Due to the limitation of memory and I/O resources,the efficiency of the algorithm has been greatly affected in mass data processing. In this paper,proposed a novel Na?ve Bayes algorithm based on MapReduce programming model. Training set is cut apart before being processed. The core processing procedure is accomplished by MapReduce model. Extraction and parsing of the training set are processed in the Map function. Knowledge base of class and feature attributes are built in the Reduce function. In the experiments,mainly compare the performance of both the traditional algorithm and the improved parallel algorithm. The result of experiments shows that the parallel Na?ve Bayes algorithm has good efficiency and high scalability in mass data processing.