科技通报
科技通報
과기통보
BULLETIN OF SCIENCE AND TECHNOLOGY
2015年
8期
129-131
,共3页
大数据%分层建树%聚类算法
大數據%分層建樹%聚類算法
대수거%분층건수%취류산법
large data%hierarchical difference%clustering algorithm
对大数据的分层建树聚类,提高对大数据的检测和大数据应用系统的故障分析能力.传统方法中对大数据的分层聚类采用K-Means聚类算法,容易陷入局部收敛,聚类效果不好.提出一种基于核向量机的数据的分层建树聚类.采用四叉树算法对多维数据进行数据预处理,进行KNN中心区域的聚类中心扩展处理,针对大数据的类域交叉性进行了一次核向量机差分比较,得到KNN模糊划分矩阵,根据所属类别的不同对已知样本进行分层,得到一维差分分层建树模型和二维差分分层建树模型,计算数据核向量之间的相似度特征,实现矩阵的数据点数模糊集合贴近度填充,实现聚类算法改进.仿真结果表明,该算法具有优越的大数据聚类性能,收敛性好,应用到网络在线故障诊断中,实现对故障信号的和恢复跟踪,提高了故障诊断效益,展示了较好的应用价值.
對大數據的分層建樹聚類,提高對大數據的檢測和大數據應用繫統的故障分析能力.傳統方法中對大數據的分層聚類採用K-Means聚類算法,容易陷入跼部收斂,聚類效果不好.提齣一種基于覈嚮量機的數據的分層建樹聚類.採用四扠樹算法對多維數據進行數據預處理,進行KNN中心區域的聚類中心擴展處理,針對大數據的類域交扠性進行瞭一次覈嚮量機差分比較,得到KNN模糊劃分矩陣,根據所屬類彆的不同對已知樣本進行分層,得到一維差分分層建樹模型和二維差分分層建樹模型,計算數據覈嚮量之間的相似度特徵,實現矩陣的數據點數模糊集閤貼近度填充,實現聚類算法改進.倣真結果錶明,該算法具有優越的大數據聚類性能,收斂性好,應用到網絡在線故障診斷中,實現對故障信號的和恢複跟蹤,提高瞭故障診斷效益,展示瞭較好的應用價值.
대대수거적분층건수취류,제고대대수거적검측화대수거응용계통적고장분석능력.전통방법중대대수거적분층취류채용K-Means취류산법,용역함입국부수렴,취류효과불호.제출일충기우핵향량궤적수거적분층건수취류.채용사차수산법대다유수거진행수거예처리,진행KNN중심구역적취류중심확전처리,침대대수거적류역교차성진행료일차핵향량궤차분비교,득도KNN모호화분구진,근거소속유별적불동대이지양본진행분층,득도일유차분분층건수모형화이유차분분층건수모형,계산수거핵향량지간적상사도특정,실현구진적수거점수모호집합첩근도전충,실현취류산법개진.방진결과표명,해산법구유우월적대수거취류성능,수렴성호,응용도망락재선고장진단중,실현대고장신호적화회복근종,제고료고장진단효익,전시료교호적응용개치.
on the stratified data clustering analysis of achievements, improve the capability of fault detection for large data and data application system. K-Means clustering algorithm and hierarchical clustering for large data by using the tradition-al method, is easy to fall into local convergence, clustering effect is not good. Put forward a kind of contribution of hierarchi-cal cluster core vector machine based data. Using four binary tree algorithm of data pretreatment on multidimensional data, the clustering center of the central region of the KNN extension processing, according to the kind of cross domain of big da-ta for a core vector machine differential comparison, get the KNN fuzzy partition matrix, according to the category of the known sample of different stratification, one-dimensional difference a hierarchical contribution model and two-dimension-al differential layered contribution model, calculating the similarity between the characteristics of nuclear data vector, the realization of matrix data points close to the degree of filling of fuzzy set. Realization of improved clustering algorithm is ob-tained. Simulation results show that the algorithm has excellent performance of clustering large data, good convergence, it is applied to the online fault diagnosis network, the realization of fault signal tracking and recovery, improve the fault diagno-sis efficiency. It shows a good application value.