桂林理工大学学报
桂林理工大學學報
계림리공대학학보
JOURNAL OF GUILIN UNIVERSITY OF TECHNOLOGY
2013年
4期
765-769
,共5页
支持向量机%遗传算法%Hadoop%数据分类
支持嚮量機%遺傳算法%Hadoop%數據分類
지지향량궤%유전산법%Hadoop%수거분류
support vector machines%genetic algorithm%Hadoop%data classification
为了提高数据的分类效率和准确度,利用云计算提供的弹性集群平台来解决计算力伸缩性瓶颈,并用MapReduce编程模型对SVM进行Map和Reduce并行化处理,并将基于优化理论的遗传算法( GA)引入SVM分类算法中对分类器参数进行优化,以分类器的准确率作为GA算法适应度函数,找出全局最优的模型参数和核函数参数值。经开源云计算平台Hadoop实验验证,数据分类的准确度有了明显的提高,整个分类过程的加速度几近呈线性增加。
為瞭提高數據的分類效率和準確度,利用雲計算提供的彈性集群平檯來解決計算力伸縮性瓶頸,併用MapReduce編程模型對SVM進行Map和Reduce併行化處理,併將基于優化理論的遺傳算法( GA)引入SVM分類算法中對分類器參數進行優化,以分類器的準確率作為GA算法適應度函數,找齣全跼最優的模型參數和覈函數參數值。經開源雲計算平檯Hadoop實驗驗證,數據分類的準確度有瞭明顯的提高,整箇分類過程的加速度幾近呈線性增加。
위료제고수거적분류효솔화준학도,이용운계산제공적탄성집군평태래해결계산력신축성병경,병용MapReduce편정모형대SVM진행Map화Reduce병행화처리,병장기우우화이론적유전산법( GA)인입SVM분류산법중대분류기삼수진행우화,이분류기적준학솔작위GA산법괄응도함수,조출전국최우적모형삼수화핵함수삼수치。경개원운계산평태Hadoop실험험증,수거분류적준학도유료명현적제고,정개분류과정적가속도궤근정선성증가。
By Support Vector Machines ( SVM) classification of large-scale data on traditional platform, plat-form computing power usually encounters with scalability bottlenecks and classification process cannot be effi-ciently parallelized.That leads to low classification efficiency.The values of the SVM model parameters and kernel function parameters decisively affect the classification accuracy .But the values of those parameters most-ly are random values or the values of experience, which results in low classification accuracy.In order to im-prove the accuracy and efficiency of data classification , the elastic cloud computing cluster platform is used to solve the scalability bottleneck of computing power , so that SVM can map and reduce parallelization by MapRe-duce programming model.Genetic algorithm (GA) based on optimization theory is introduced to optimize the parameters of the classifier in SVM classification algorithm , with accuracy of the classifier as the fitness function of GA algorithm to find the global optimum parameter values of SVM model and kernel functions .Experimental results show that in the open source cloud computing platform Hadoop , data classification accuracy is signifi-cantly improved and the acceleration of the entire classification process is almost a linear increase .