计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2013年
24期
122-129
,共8页
李桥%周莹莲%黄胜%马翔
李橋%週瑩蓮%黃勝%馬翔
리교%주형련%황성%마상
离群数据挖掘%角度%随机投影算法%接近线性时间%可靠性%效率
離群數據挖掘%角度%隨機投影算法%接近線性時間%可靠性%效率
리군수거알굴%각도%수궤투영산법%접근선성시간%가고성%효솔
outlier data mining%angle%random projection algorithm%near-linear time%reliability%efficiency
d 维点集离群数据挖掘技术是目前数据挖掘领域的研究热点之一。当前基于距离或最近邻概念进行离群数据挖掘时,在高维数据情况下的挖掘效果不佳,鉴于此,将基于角度的离群因子应用到高维离群数据挖掘中,提出一种新的基于随机投影算法的离群数据挖掘方案,它只需要用接近线性时间的方法就能预测所有数据点的基于角度的离群因子。该方法可以用于并行环境进行并行加速。对近似质量进行了理论分析,以保证算法的可靠性。合成和真实数据集实验结果表明,对超高维数据集,该方法效率高、可伸缩性强。
d 維點集離群數據挖掘技術是目前數據挖掘領域的研究熱點之一。噹前基于距離或最近鄰概唸進行離群數據挖掘時,在高維數據情況下的挖掘效果不佳,鑒于此,將基于角度的離群因子應用到高維離群數據挖掘中,提齣一種新的基于隨機投影算法的離群數據挖掘方案,它隻需要用接近線性時間的方法就能預測所有數據點的基于角度的離群因子。該方法可以用于併行環境進行併行加速。對近似質量進行瞭理論分析,以保證算法的可靠性。閤成和真實數據集實驗結果錶明,對超高維數據集,該方法效率高、可伸縮性彊。
d 유점집리군수거알굴기술시목전수거알굴영역적연구열점지일。당전기우거리혹최근린개념진행리군수거알굴시,재고유수거정황하적알굴효과불가,감우차,장기우각도적리군인자응용도고유리군수거알굴중,제출일충신적기우수궤투영산법적리군수거알굴방안,타지수요용접근선성시간적방법취능예측소유수거점적기우각도적리군인자。해방법가이용우병행배경진행병행가속。대근사질량진행료이론분석,이보증산법적가고성。합성화진실수거집실험결과표명,대초고유수거집,해방법효솔고、가신축성강。
Outlier mining in d-dimensional point sets is currently one of the hot areas of data mining. The current outlier mining approaches based on the distance or the nearest neighbor result in the poor mining results. To solve this problem, this paper investi-gates the use of angle-based outlier factor in mining high dimensional outliers. It proposes a novel random projection-based tech-nique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, the approach is suitable to be performed in parallel environment to achieve a parallel speedup. It introduces a theoretical analysis of the quality of approximation to guarantee the reliability of the algorithm. The empirical experiments on synthetic and real world data sets demonstrate that the approach is efficient and scalable to very large high-dimensional data sets.