地球信息科学学报
地毬信息科學學報
지구신식과학학보
GEO-INFORMATION SCIENCE
2015年
1期
1-7
,共7页
魏海涛%杜云艳%任浩玮%刘张%易嘉伟%许开辉
魏海濤%杜雲豔%任浩瑋%劉張%易嘉偉%許開輝
위해도%두운염%임호위%류장%역가위%허개휘
点数据%离散性%N-KD树%K-D树%动态分组%数据并行
點數據%離散性%N-KD樹%K-D樹%動態分組%數據併行
점수거%리산성%N-KD수%K-D수%동태분조%수거병행
point data%discreteness%N-KD Tree%K-D Tree%dynamic grouping%data parallel
随着科学技术的进步,地理空间数据的分析处理面临着数据量膨胀和计算量高速增长的双重挑战,为了解决海量数据处理速度慢的问题,本文针对空间分布不均匀的点数据,从数据并行的角度,以保持数据的空间邻近性及保证数据分组后各组数据量负载均衡为目标,提出基于N-KD树(Number-K Dimension Tree)数据动态分组的方法,其是一种面向实时变化(数据量和数据空间范围变化)的空间数据动态分组方法。该方法借鉴K-D树的创建和最临近点搜索的思想,通过方差判断数据分布稀疏程度,利用最临近点搜索方法处理边界点,实现空间范围的不均等切分,保证数据分组后各组数据量基本均衡。试验表明,该方法具有较好的动态分组效果与较高的计算效率;支持各种分布状态的空间点数据的分组;分组后各组数据量负载均衡;分组算法本身有支持并行、支持分布式协同工作模式的特点。
隨著科學技術的進步,地理空間數據的分析處理麵臨著數據量膨脹和計算量高速增長的雙重挑戰,為瞭解決海量數據處理速度慢的問題,本文針對空間分佈不均勻的點數據,從數據併行的角度,以保持數據的空間鄰近性及保證數據分組後各組數據量負載均衡為目標,提齣基于N-KD樹(Number-K Dimension Tree)數據動態分組的方法,其是一種麵嚮實時變化(數據量和數據空間範圍變化)的空間數據動態分組方法。該方法藉鑒K-D樹的創建和最臨近點搜索的思想,通過方差判斷數據分佈稀疏程度,利用最臨近點搜索方法處理邊界點,實現空間範圍的不均等切分,保證數據分組後各組數據量基本均衡。試驗錶明,該方法具有較好的動態分組效果與較高的計算效率;支持各種分佈狀態的空間點數據的分組;分組後各組數據量負載均衡;分組算法本身有支持併行、支持分佈式協同工作模式的特點。
수착과학기술적진보,지리공간수거적분석처리면림착수거량팽창화계산량고속증장적쌍중도전,위료해결해량수거처리속도만적문제,본문침대공간분포불균균적점수거,종수거병행적각도,이보지수거적공간린근성급보증수거분조후각조수거량부재균형위목표,제출기우N-KD수(Number-K Dimension Tree)수거동태분조적방법,기시일충면향실시변화(수거량화수거공간범위변화)적공간수거동태분조방법。해방법차감K-D수적창건화최림근점수색적사상,통과방차판단수거분포희소정도,이용최림근점수색방법처리변계점,실현공간범위적불균등절분,보증수거분조후각조수거량기본균형。시험표명,해방법구유교호적동태분조효과여교고적계산효솔;지지각충분포상태적공간점수거적분조;분조후각조수거량부재균형;분조산법본신유지지병행、지지분포식협동공작모식적특점。
The large amount of calculations with a continuously high-speed growth required in current develop-ment of computational technology presents a great challenge to the research of spatial data processing in geogra-phy. To accelerate the processing speed of constantly updated real-time data that distributed unevenly, this study developed a dynamic grouping method based on the Number-K Dimension (N-KD) Tree technique. The method employs a data parallel algorithm to preserve spatial proximity and balance the data loads after grouping the real-time data that vary both in quantity and spatial domain. Based on the KD Tree creation, the algorithm measures the data sparsity according to the mathematic variance, addresses the point data near boundaries according to the nearest searching approach, and achieve an unequal data partition with respect to space while having balanced da-ta loads. Experiments reveal that the method is capable of efficiently grouping the point data that have various spatial distributions and balancing the data amount among these groups. The algorithm also supports parallel computing and distributed collaborative working model, which highlights the practical values in its applications.