计算机技术与发展
計算機技術與髮展
계산궤기술여발전
COMPUTER TECHNOLOGY AND DEVELOPMENT
2014年
5期
65-69
,共5页
数据降维%核主成分分析法%支持向量机%垃圾标签
數據降維%覈主成分分析法%支持嚮量機%垃圾標籤
수거강유%핵주성분분석법%지지향량궤%랄급표첨
data dimension reduction%kernel principal component analysis theory%support vector machine%social spam
高维数据中进行各种处理时所需样本数量会成指数级增加,同时样本间距离的价值也逐渐减小,将导致维数灾问题。文本标签数据通常会面临数据维数过高的问题,会影响用户对垃圾标签的检测。文中借助支持向量机的数学模型构建出针对Folksonomy的大规模垃圾标签检测模型。为了减少检测垃圾标签时维数过高的影响,在核主成分分析理论的启发下,将数据降维思想引入数据约简领域,提出基于核主成分分析法的大规模SVM数据集约简模型。最终实例化形成一种新的垃圾标签检测方法,即基于核主成分分析支持向量机( KPCA-SVM)的大规模垃圾标签检测模型。该模型在垃圾标签检测中可以在不影响数据特征的前提下,缩短模型的测试时间且检测性能良好。
高維數據中進行各種處理時所需樣本數量會成指數級增加,同時樣本間距離的價值也逐漸減小,將導緻維數災問題。文本標籤數據通常會麵臨數據維數過高的問題,會影響用戶對垃圾標籤的檢測。文中藉助支持嚮量機的數學模型構建齣針對Folksonomy的大規模垃圾標籤檢測模型。為瞭減少檢測垃圾標籤時維數過高的影響,在覈主成分分析理論的啟髮下,將數據降維思想引入數據約簡領域,提齣基于覈主成分分析法的大規模SVM數據集約簡模型。最終實例化形成一種新的垃圾標籤檢測方法,即基于覈主成分分析支持嚮量機( KPCA-SVM)的大規模垃圾標籤檢測模型。該模型在垃圾標籤檢測中可以在不影響數據特徵的前提下,縮短模型的測試時間且檢測性能良好。
고유수거중진행각충처리시소수양본수량회성지수급증가,동시양본간거리적개치야축점감소,장도치유수재문제。문본표첨수거통상회면림수거유수과고적문제,회영향용호대랄급표첨적검측。문중차조지지향량궤적수학모형구건출침대Folksonomy적대규모랄급표첨검측모형。위료감소검측랄급표첨시유수과고적영향,재핵주성분분석이론적계발하,장수거강유사상인입수거약간영역,제출기우핵주성분분석법적대규모SVM수거집약간모형。최종실례화형성일충신적랄급표첨검측방법,즉기우핵주성분분석지지향량궤( KPCA-SVM)적대규모랄급표첨검측모형。해모형재랄급표첨검측중가이재불영향수거특정적전제하,축단모형적측시시간차검측성능량호。
The needed sample will increase exponentially when processing high-dimensional data,the value of the distance between the sample also gradually reduced at the same time,which will lead to the dimension disaster problem. Text label data usually face this prob-lem of high-dimensional data,it will affect the users to detect social spam. In this paper,take advantage of the mathematical model of Support Vector Machine ( SVM) to construct the large-scale social spam detection model for Foklsonomy. In order to reduce the influ-ence of high-dimensional data,inspired by the kernel principal component analysis theory,the ideas of data dimension reduction are intro-duced,the large-scale SVM data set reduction model is proposed which is based on kernel principal component analysis. Finally form a new social spam detection method,the large-scale social spam detection model based on kernel principal component analysis and support vector machine. This model would not affect the characteristics in the social spam detection,and it will shorten the test time and have a good detection performance.