计算机与数字工程
計算機與數字工程
계산궤여수자공정
Computer and Digital Engineering
2015年
10期
1717-1722,1728
,共7页
高维数据%大数据%可变网格%M 树
高維數據%大數據%可變網格%M 樹
고유수거%대수거%가변망격%M 수
high-dimensional data%big data%variable grid%M-tree
伴随着互联网和云计算技术的飞速发展,国民经济各行各业涉及的数据量急剧增加,特别是积累了大量的诸如网络交易数据、用户评论数据以及多媒体数据等海量高维数据。有效的海量高维数据索引结构能够提高大数据环境下高维数据查询处理的性能。因此,首先提出了一种大数据环境下基于可变网格的二级高维数据索引结构,全局索引维护数据空间中所有子空间的位置关系信息,局部索引通过在每个子空间上构建 M 树管理自身的数据;其次,提出了基于二级索引结构的相似查询处理算法,包括点查询和范围查询,查询时通过全局索引快速定位与查询相关的局部索引节点并在每个局部节点上并行查找,避免了在不必要的节点上进行查询;最后,大量实验结果表明提出的索引结构优于现有索引结构,具有良好的查询性能和可扩展性。
伴隨著互聯網和雲計算技術的飛速髮展,國民經濟各行各業涉及的數據量急劇增加,特彆是積纍瞭大量的諸如網絡交易數據、用戶評論數據以及多媒體數據等海量高維數據。有效的海量高維數據索引結構能夠提高大數據環境下高維數據查詢處理的性能。因此,首先提齣瞭一種大數據環境下基于可變網格的二級高維數據索引結構,全跼索引維護數據空間中所有子空間的位置關繫信息,跼部索引通過在每箇子空間上構建 M 樹管理自身的數據;其次,提齣瞭基于二級索引結構的相似查詢處理算法,包括點查詢和範圍查詢,查詢時通過全跼索引快速定位與查詢相關的跼部索引節點併在每箇跼部節點上併行查找,避免瞭在不必要的節點上進行查詢;最後,大量實驗結果錶明提齣的索引結構優于現有索引結構,具有良好的查詢性能和可擴展性。
반수착호련망화운계산기술적비속발전,국민경제각행각업섭급적수거량급극증가,특별시적루료대량적제여망락교역수거、용호평론수거이급다매체수거등해량고유수거。유효적해량고유수거색인결구능구제고대수거배경하고유수거사순처리적성능。인차,수선제출료일충대수거배경하기우가변망격적이급고유수거색인결구,전국색인유호수거공간중소유자공간적위치관계신식,국부색인통과재매개자공간상구건 M 수관리자신적수거;기차,제출료기우이급색인결구적상사사순처리산법,포괄점사순화범위사순,사순시통과전국색인쾌속정위여사순상관적국부색인절점병재매개국부절점상병행사조,피면료재불필요적절점상진행사순;최후,대량실험결과표명제출적색인결구우우현유색인결구,구유량호적사순성능화가확전성。
With the rapid development of the Internet and cloud computing techniques ,the amount of data in the whole sectors of national economy increases sharply ,especially the high‐dimensional big data ,such as the network transactions da‐ta ,the user reviews data and the multimedia data .A proper index structure to support high‐dimension big data can improve the performance of similarity query on high‐dimensional big data .Therefore ,a distributed two‐level index structure is pro‐posed firstly ,in which global index maintains all the information of subspace in the whole data space ,and in which local in‐dex builds M‐tree on each subspace to organize local high‐dimension data .Secondly ,a similarity search algorithm is proposed based on our two‐level index ,including point query and range query .When processing queries from users ,global index can quickly locate and judge which subspaces are relevant to the query and send the query to relevant subspaces .Queries will be processed on local nodes concurrently .This approach can avoid lots of unnecessary retrieves on query‐irrelevant subspaces . Lastly ,massive of experiments also show that the proposed index is much better than existing high‐dimension index struc‐ture ,and has good query performance and scalability .