计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2015年
3期
124-128
,共5页
支持向量机%机器学习%文本分类%分类模型%KKT条件
支持嚮量機%機器學習%文本分類%分類模型%KKT條件
지지향량궤%궤기학습%문본분류%분류모형%KKT조건
Support Vector Machine(SVM)%machine learning%text classification%model of classification%Karush-Kuhn-Tucker(KKT)
针对I-SVM算法在文本分类中训练时间较长和分类效率低的问题,提出了一种基于支持向量(SV)阀值控制的优化I-SVM算法(TI-SVM)。由于在增量训练样本集中存在大量的非SV,TI-SVM算法根据历史训练模型和KKT条件对新增样本集和历史样本集进行预处理,剔除大部分的非SV,根据预处理后的样本集进行训练新的SVM模型,利用文本的相似度和预设SV的阀值对模型中的冗余SV进一步处理,以提高分类性能。经过对一组客户新闻分类的实验表明,该算法在保证分类精度的同时有效提高了模型的训练和分类效率。
針對I-SVM算法在文本分類中訓練時間較長和分類效率低的問題,提齣瞭一種基于支持嚮量(SV)閥值控製的優化I-SVM算法(TI-SVM)。由于在增量訓練樣本集中存在大量的非SV,TI-SVM算法根據歷史訓練模型和KKT條件對新增樣本集和歷史樣本集進行預處理,剔除大部分的非SV,根據預處理後的樣本集進行訓練新的SVM模型,利用文本的相似度和預設SV的閥值對模型中的冗餘SV進一步處理,以提高分類性能。經過對一組客戶新聞分類的實驗錶明,該算法在保證分類精度的同時有效提高瞭模型的訓練和分類效率。
침대I-SVM산법재문본분류중훈련시간교장화분류효솔저적문제,제출료일충기우지지향량(SV)벌치공제적우화I-SVM산법(TI-SVM)。유우재증량훈련양본집중존재대량적비SV,TI-SVM산법근거역사훈련모형화KKT조건대신증양본집화역사양본집진행예처리,척제대부분적비SV,근거예처리후적양본집진행훈련신적SVM모형,이용문본적상사도화예설SV적벌치대모형중적용여SV진일보처리,이제고분류성능。경과대일조객호신문분류적실험표명,해산법재보증분류정도적동시유효제고료모형적훈련화분류효솔。
With information constantly updating and sample collecting, the classification performance and accuracy of initial training model using I-SVM is of low efficiency and costs long time. To solve this problem, this paper describes a growing Incremental Supported Vector Machine algorithm(I-SVM)based on support vector threshold control optimization. The TI-SVM algorithm removes most of the non-support vector which aims at new sample sets and the historical sample set that are based on historical training model and the KKT conditions pretreatment. According to the sample after the pretreat-ment set, this algorithm trains a new SVM model. It takes vantage of the similarity of the text and the default threshold of support vector system to give a further treatment to the redundancy of support vector and to improve the classification perfor-mance. The theoretical analysis and experimental results show that the algorithm is effective with a high classification accuracy.