计算机技术与发展
計算機技術與髮展
계산궤기술여발전
COMPUTER TECHNOLOGY AND DEVELOPMENT
2013年
2期
60-64
,共5页
网格%聚类%数据挖掘%MapReduce并行化
網格%聚類%數據挖掘%MapReduce併行化
망격%취류%수거알굴%MapReduce병행화
grid%clustering algorithm%data mining%MapReduce parallelization
面对增量式增长的聚类数据,受云计算并行化处理模式的启发,文中对一种网格化聚类算法进行了MapReduce并行化研究.该算法首先利用网格处理技术对数据进行预处理,用网格预处理后所得单元的重心点取代该单元中保存的所有点,然后在MapReduce框架下将各个单元的重心点作为聚类分析的基本数据单元,进行聚类分析.实验结果表明,该算法MapReduce并行化后部署在Hadoop集群上运行,具有与原来相同的聚类效果,并能节省聚类分析的时间和降低计算的复杂度,适合用于高纬度、增量式的海量数据的分析和挖掘.
麵對增量式增長的聚類數據,受雲計算併行化處理模式的啟髮,文中對一種網格化聚類算法進行瞭MapReduce併行化研究.該算法首先利用網格處理技術對數據進行預處理,用網格預處理後所得單元的重心點取代該單元中保存的所有點,然後在MapReduce框架下將各箇單元的重心點作為聚類分析的基本數據單元,進行聚類分析.實驗結果錶明,該算法MapReduce併行化後部署在Hadoop集群上運行,具有與原來相同的聚類效果,併能節省聚類分析的時間和降低計算的複雜度,適閤用于高緯度、增量式的海量數據的分析和挖掘.
면대증량식증장적취류수거,수운계산병행화처리모식적계발,문중대일충망격화취류산법진행료MapReduce병행화연구.해산법수선이용망격처리기술대수거진행예처리,용망격예처리후소득단원적중심점취대해단원중보존적소유점,연후재MapReduce광가하장각개단원적중심점작위취류분석적기본수거단원,진행취류분석.실험결과표명,해산법MapReduce병행화후부서재Hadoop집군상운행,구유여원래상동적취류효과,병능절성취류분석적시간화강저계산적복잡도,괄합용우고위도、증량식적해량수거적분석화알굴.
As the incremental growth of clustering data and inspired by the parallel processing model of cloud computing,conducted the MapReduce parallelization research for a clustering algorithm based on gird. This algorithm,firstly,preprocessed the data using the grid processing method,then used the center of gravity of the grid unit as the basic data unit for the clustering analysis under the MapReduce framework,instead of using all the points stored in the unit. The result of experiments demonstrate that this clustering algorithm after its MapReduce parallelization had the same result as before running in the Hadoop cluster. This clustering algorithm can also save the time of analysis and reduce the computational complexity. So,it is suitable for the analysis and data mining of incremental massive data with high latitudes.