计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
17期
44-48,55
,共6页
并行支持向量机%大规模数据集%有限资源%随机傅里叶特征%一致中心调节
併行支持嚮量機%大規模數據集%有限資源%隨機傅裏葉特徵%一緻中心調節
병행지지향량궤%대규모수거집%유한자원%수궤부리협특정%일치중심조절
parallel Support Vector Machines(SVM)%large-scale datasets%limited resource%random Fourier features%consensus centre adjustment
支持向量机(SVM)是最为流行的分类工具,但处理大规模的数据集时,需要大量的内存资源和训练时间,通常在大集群并行环境下才能实现。提出一种新的并行SVM算法,RF-CCASVM,可在有限计算资源上求解大规模SVM。通过随机傅里叶映射,应用低维显示特征映射一致近似高斯核对应的无限维隐式特征映射,从而用线性SVM一致近似高斯核SVM。提出一致中心调节的并行化方法。具体地,将数据集划分成若干子数据集,多个进程并行地在各自的子数据集上独立训练SVM。当各个子数据集上的最优超平面即将求出时,用由各个子集上获得的一致中心解取代当前解,继续在各子集上训练直到一致中心解在各个子集上达到最优。标准数据集的对比实验验证了RF-CCASVM的正确性和有效性。
支持嚮量機(SVM)是最為流行的分類工具,但處理大規模的數據集時,需要大量的內存資源和訓練時間,通常在大集群併行環境下纔能實現。提齣一種新的併行SVM算法,RF-CCASVM,可在有限計算資源上求解大規模SVM。通過隨機傅裏葉映射,應用低維顯示特徵映射一緻近似高斯覈對應的無限維隱式特徵映射,從而用線性SVM一緻近似高斯覈SVM。提齣一緻中心調節的併行化方法。具體地,將數據集劃分成若榦子數據集,多箇進程併行地在各自的子數據集上獨立訓練SVM。噹各箇子數據集上的最優超平麵即將求齣時,用由各箇子集上穫得的一緻中心解取代噹前解,繼續在各子集上訓練直到一緻中心解在各箇子集上達到最優。標準數據集的對比實驗驗證瞭RF-CCASVM的正確性和有效性。
지지향량궤(SVM)시최위류행적분류공구,단처리대규모적수거집시,수요대량적내존자원화훈련시간,통상재대집군병행배경하재능실현。제출일충신적병행SVM산법,RF-CCASVM,가재유한계산자원상구해대규모SVM。통과수궤부리협영사,응용저유현시특정영사일치근사고사핵대응적무한유은식특정영사,종이용선성SVM일치근사고사핵SVM。제출일치중심조절적병행화방법。구체지,장수거집화분성약간자수거집,다개진정병행지재각자적자수거집상독립훈련SVM。당각개자수거집상적최우초평면즉장구출시,용유각개자집상획득적일치중심해취대당전해,계속재각자집상훈련직도일치중심해재각개자집상체도최우。표준수거집적대비실험험증료RF-CCASVM적정학성화유효성。
Support Vector Machines(SVMs)have become popular classification tools, but when dealing with very large datasets, SVMs need large memory requirement and computation time. Therefore, large-scale SVMs are performed on computer clusters or supercomputers. A novel parallel algorithm for large-scale SVM is presented. The algorithm is per-formed on a resource-limited computing environment and guarantees a uniform convergence. The infinite-dimensional implicit feature mapping of the Gaussian kernel function is sufficiently approximated by a low-dimensional feature map-ping. The kernel SVM is approximated with a linear SVM by explicitly mapping data to low-dimensional features using random the Fourier map. The parallelization of the algorithm is implemented with a consensus centre adjustment strategy. Concretely, the dataset is partitioned into several subsets, and separate SVMs are trained on processors parallel with the subsets. When the optimal hyperplanes on subsets are nearly found, solutions achieved by separate SVMs are replaced by the consensus centre and are retrained on the subsets until the consensus centre is optimal on all subsets. Comparative experiments on benchmark databases are performed. The results show that the proposed resource-limited parallel algo-rithm is effective and efficient.