CAJ | 학술논문

标准SVM学习算法运行所需的时间和空间复杂度分别为O(l~3)和O(l~2),l为训练样本的数量,因此不适用于对超大数据集进行训练.提出一种基于近似解的SVM训练算法:Approximate Vector Machine(AVM).AVM采用增量学习的策略来寻找近似最优分类超平面,并且在迭代过程中采用热启动及抽样技巧来加快训练速度.理论分析表明,该算法的计算复杂度与训练样本的数量无关,因此具有良好的时间与空间扩展性.在超大数据集上的实验结果表明,该算法在极大提高训练速度的同时,仍然保持了原始分类器的泛化性能,并且训练完毕具有较少的支持向量,因此结果分类器具有更快的分类速度.
표준SVM학습산법운행소수적시간화공간복잡도분별위O(l~3)화O(l~2),l위훈련양본적수량,인차불괄용우대초대수거집진행훈련.제출일충기우근사해적SVM훈련산법:Approximate Vector Machine(AVM).AVM채용증량학습적책략래심조근사최우분류초평면,병차재질대과정중채용열계동급추양기교래가쾌훈련속도.이론분석표명,해산법적계산복잡도여훈련양본적수량무관,인차구유량호적시간여공간확전성.재초대수거집상적실험결과표명,해산법재겁대제고훈련속도적동시,잉연보지료원시분류기적범화성능,병차훈련완필구유교소적지지향량,인차결과분류기구유경쾌적분류속도.
Standard Support Vector Machine (SVM) training has O(l~3) time and O(l~2) space complexities,where l is the training set size.It is thus computationally infeasible on very large data sets.A novel SVM training method, Approx-imate Vector Machine (AVM), based on approximate solution was presented to scale up kernel methods on very large data sets.This approach only obtains an approximately optimal hyper plane by incremental learning, and uses probabilis-tic speedup and hot start tricks to accelerate training speed during each iterative stage.Theoretical analysis indicates that AVM has the time and space complexities that are independent of training set size.Experiments on very large data sets show that the proposed method not only preserves the generalization performance of the original SVM classifiers, but outperforms existing scale-up methods in terms of training time and number of support vectors.