地球信息科学学报
地毬信息科學學報
지구신식과학학보
GEO-INFORMATION SCIENCE
2013年
2期
166-174
,共9页
陈崇成%林剑峰%吴小竹%巫建伟%连惠群
陳崇成%林劍峰%吳小竹%巫建偉%連惠群
진숭성%림검봉%오소죽%무건위%련혜군
空间数据%云存储%NoSQL%地理知识云%数据聚合中心
空間數據%雲存儲%NoSQL%地理知識雲%數據聚閤中心
공간수거%운존저%NoSQL%지리지식운%수거취합중심
geo-spatial data%vector data%cloud-enabled data storage%NoSQL%Geographical Knowledge Cloud%data aggregation%access service
近年来,实现海量空间数据高效地存储管理和在线服务,成为地学信息科学领域日益关注的热点问题。本文根据矢量和栅格空间数据的不同特点,提出并实现了矢量栅格数据一体化的海量空间数据分布式云存储管理与访问服务方案,在海量矢量数据存储和处理中创新性引入分布式图数据库Neo4J和并行图计算框架。在三层式空间数据云存储架构基础上,给出NoSQL数据库技术的栅格和矢量数据云存储的实现策略与方法,并开展了通用数据访问接口的设计。采用分布式文件系统 HDFS 存储栅格数据,并使用列族数据库 HBase 对其建立分布式空间索引,及采用满足ACID约束的分布式图数据库Neo4J来存储矢量数据,并使用R树建立空间索引。在自主研发的地理知识云平台GeoKSCloud框架下,初步实现了核心组件-空间数据聚合中心(GeoDAC)软件,可为各类用户提供空间数据分布式存储管理和访问服务。通过搭建试验床,开展GeoDAC与开源GIS软件PostGIS在矢量数据读写访问性能方面的对比测试。结果表明,虽然GeoDAC没有获得写入性能的加速作用,但其具有PostGIS无法比拟的强大读取性能。GeoDAC将海量数据经过空间分割后分布在集群上,能够并行处理查询请求,极大地提高空间查询速度,具有广阔的应用前景。
近年來,實現海量空間數據高效地存儲管理和在線服務,成為地學信息科學領域日益關註的熱點問題。本文根據矢量和柵格空間數據的不同特點,提齣併實現瞭矢量柵格數據一體化的海量空間數據分佈式雲存儲管理與訪問服務方案,在海量矢量數據存儲和處理中創新性引入分佈式圖數據庫Neo4J和併行圖計算框架。在三層式空間數據雲存儲架構基礎上,給齣NoSQL數據庫技術的柵格和矢量數據雲存儲的實現策略與方法,併開展瞭通用數據訪問接口的設計。採用分佈式文件繫統 HDFS 存儲柵格數據,併使用列族數據庫 HBase 對其建立分佈式空間索引,及採用滿足ACID約束的分佈式圖數據庫Neo4J來存儲矢量數據,併使用R樹建立空間索引。在自主研髮的地理知識雲平檯GeoKSCloud框架下,初步實現瞭覈心組件-空間數據聚閤中心(GeoDAC)軟件,可為各類用戶提供空間數據分佈式存儲管理和訪問服務。通過搭建試驗床,開展GeoDAC與開源GIS軟件PostGIS在矢量數據讀寫訪問性能方麵的對比測試。結果錶明,雖然GeoDAC沒有穫得寫入性能的加速作用,但其具有PostGIS無法比擬的彊大讀取性能。GeoDAC將海量數據經過空間分割後分佈在集群上,能夠併行處理查詢請求,極大地提高空間查詢速度,具有廣闊的應用前景。
근년래,실현해량공간수거고효지존저관리화재선복무,성위지학신식과학영역일익관주적열점문제。본문근거시량화책격공간수거적불동특점,제출병실현료시량책격수거일체화적해량공간수거분포식운존저관리여방문복무방안,재해량시량수거존저화처리중창신성인입분포식도수거고Neo4J화병행도계산광가。재삼층식공간수거운존저가구기출상,급출NoSQL수거고기술적책격화시량수거운존저적실현책략여방법,병개전료통용수거방문접구적설계。채용분포식문건계통 HDFS 존저책격수거,병사용렬족수거고 HBase 대기건립분포식공간색인,급채용만족ACID약속적분포식도수거고Neo4J래존저시량수거,병사용R수건립공간색인。재자주연발적지리지식운평태GeoKSCloud광가하,초보실현료핵심조건-공간수거취합중심(GeoDAC)연건,가위각류용호제공공간수거분포식존저관리화방문복무。통과탑건시험상,개전GeoDAC여개원GIS연건PostGIS재시량수거독사방문성능방면적대비측시。결과표명,수연GeoDAC몰유획득사입성능적가속작용,단기구유PostGIS무법비의적강대독취성능。GeoDAC장해량수거경과공간분할후분포재집군상,능구병행처리사순청구,겁대지제고공간사순속도,구유엄활적응용전경。
In recent years, how to implement a efficient storage management on massive geo-spatial data and ul-teriorly web service for a broad variety of users, has becomes an increasingly hot issue in the field of geographi-cal information science, with the explosive growth of Earth Observation System(EOS) data and the flourish of the new geography paradigm. A cloud storage system to provide distributed cloud-enabled storage management and services for massive geo-spatial data with an integrity of both vector and raster formats is proposed in this paper in the light of their intrinsic differences. Based on three-tier layer architecture, we put forward its imple-mentation strategy and method of cloud storage management for raster and vector data respectively based on NoSQL database system, followed by a universal data access interface. The novel technolgies, which include dis-tribute graph database-Neo4J and parralel graph compute framework on massive vector data storage and process were introduced. In our research, using the distributed file system-HDFS and the column family database-HBase as a container to store massive raster data with a distributed space index technique, and the distributed graph data-base system-Neo4J is used to store massive vector data in view of the constraints of ACID with a R-tree space in-dex. Under the unified framework of Geographical Knowledge Cloud platform GeoKSCloud developed by our research group as a successor of GeoKSCloud, its core components - spatial data aggregation centre (GeoDAC) software has been in shape with aim to provide some distributed spatial data storage management and access ser-vices for all types of end users. A tesbed is established with serveral 5 physical nodes and accordingly 7 virtual nodes with different areas and operational systems. We carried out an elaborate comparison between GeoDAC and open source GIS software - PostGIS to validate vector data reading & writing performance. The prelimi-nary results indicated that, although GeoDAC has no accelerated write performance than PostGIS, but it gains significant powerful reading or spatial query performance than PostGIS. Inside GeoDAC, space-partitioned mas-sive data is distributed on the cluster and spatial query operation is implemented in parallel, consequently an en-hanced rate of spatial query is gained. The achieved techniques and system in our work will provide a variety of users a powerful tool for further in-depth processing and owns a broad application prospects.