铁路计算机应用
鐵路計算機應用
철로계산궤응용
RAILWAY COMPUTER APPLICATION
2015年
4期
1-4,8
,共5页
海量时序数据%网格聚类%MapReduce%LOF%聚类半径
海量時序數據%網格聚類%MapReduce%LOF%聚類半徑
해량시서수거%망격취류%MapReduce%LOF%취류반경
massive time series data%grid clustering%MapReduce%LOF%clustering radius
针对海量数据中离群点的挖掘,将网格聚类和MapReduce编程模型相结合,排除不可能包含离群点的网格,再用LOF算法对剩余网格中的数据进行离群点检测。为了提高网格聚类的检测精度,本文提出了一种基于聚类半径的改进算法。实验表明了该算法的有效性,同时分析了在节点数不同的情况下,网格聚类所用时间,证明了基于MapReduce的网格聚类适合处理海量时序数据。
針對海量數據中離群點的挖掘,將網格聚類和MapReduce編程模型相結閤,排除不可能包含離群點的網格,再用LOF算法對剩餘網格中的數據進行離群點檢測。為瞭提高網格聚類的檢測精度,本文提齣瞭一種基于聚類半徑的改進算法。實驗錶明瞭該算法的有效性,同時分析瞭在節點數不同的情況下,網格聚類所用時間,證明瞭基于MapReduce的網格聚類適閤處理海量時序數據。
침대해량수거중리군점적알굴,장망격취류화MapReduce편정모형상결합,배제불가능포함리군점적망격,재용LOF산법대잉여망격중적수거진행리군점검측。위료제고망격취류적검측정도,본문제출료일충기우취류반경적개진산법。실험표명료해산법적유효성,동시분석료재절점수불동적정황하,망격취류소용시간,증명료기우MapReduce적망격취류괄합처리해량시서수거。
Aiming at outlier mining in massive time series data, the paper combined grid clustering with MapReduce programming model to exclude grids that was impossible to contain outlier, and then used LOF Algorithm to detect outliers from the rest grids. In order to improve the detection accuracy of the grid clustering, this paper proposed an improved algorithm based on clustering radius. Experimental results showed the effectiveness of the improvement. Experiment also analyzed the execution time grid cluster cost under the circumstances with different number of nodes, which proved it was suitable for handling massive time series data combined MapReduce with grid clustering.