计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2009年
21期
41-43,50
,共4页
熵度量%空间邻域离群点检测%空间邻域偏离因子%空间划分
熵度量%空間鄰域離群點檢測%空間鄰域偏離因子%空間劃分
적도량%공간린역리군점검측%공간린역편리인자%공간화분
entropy measurement%spatial neighborhood outliers detections%spatial outlier factor%space division
离群点的查找算法主要有两类:第一类是面向统计数据,把各种数据都看成是多维空间.没有区分空间维与非空间维,这类算法可能产生错误的判断或找到的是无意义的离群点;第二类算法面向空间数据,区分空间维与非空间维,但该类算法查找效率太低或不能查找邻域离群点.引入熵权的概念,提出了一种新的基于熵权的空间邻域离群点度量算法.算法面向空间数据,区分空间维与非空间维,利用空间索引划分空间邻域,用非空间属性计算空间偏离因子,由此度量空间邻域的离群点.理论分析表明,该算法是合理的.实验结果表明,算法具有对用户依赖性小、检测精度和计算效率高的优点.
離群點的查找算法主要有兩類:第一類是麵嚮統計數據,把各種數據都看成是多維空間.沒有區分空間維與非空間維,這類算法可能產生錯誤的判斷或找到的是無意義的離群點;第二類算法麵嚮空間數據,區分空間維與非空間維,但該類算法查找效率太低或不能查找鄰域離群點.引入熵權的概唸,提齣瞭一種新的基于熵權的空間鄰域離群點度量算法.算法麵嚮空間數據,區分空間維與非空間維,利用空間索引劃分空間鄰域,用非空間屬性計算空間偏離因子,由此度量空間鄰域的離群點.理論分析錶明,該算法是閤理的.實驗結果錶明,算法具有對用戶依賴性小、檢測精度和計算效率高的優點.
리군점적사조산법주요유량류:제일류시면향통계수거,파각충수거도간성시다유공간.몰유구분공간유여비공간유,저류산법가능산생착오적판단혹조도적시무의의적리군점;제이류산법면향공간수거,구분공간유여비공간유,단해류산법사조효솔태저혹불능사조린역리군점.인입적권적개념,제출료일충신적기우적권적공간린역리군점도량산법.산법면향공간수거,구분공간유여비공간유,이용공간색인화분공간린역,용비공간속성계산공간편리인자,유차도량공간린역적리군점.이론분석표명,해산법시합리적.실험결과표명,산법구유대용호의뢰성소、검측정도화계산효솔고적우점.
There are usually two classes of outlier detection algorithms.One is usually applied to statistical data and takes all at-tributes as multi-dimensional space,while not distinguish between geo-spatial dimensionality and non-spatial dimeasionality in detecting process.Meaningless or incorrect outliers can be found if we use these approaches.The other outlier detection algorithms distinguish between geo-spatial dimensionality and non-spatial dimensionality,but they have poor efficiency or can't detect neighborhood outliers.To overcome these shortcomings,new approach of spatial neighborhood outliers detection based on entropy measurement is proposed.In this paper,the spatial attributes are used to determine spatial neighborhood,entropy theory is used to determine the weight of non-spatial attributes,and the non-spatial dimensions are used to compute the spatial neighborhood outli-er factor,thus spatial neighborhood outliers can be captured. Theoretical analysis shows that the algorithm is reasonable.The ex-perimental results show that the approach is practical.