计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2015年
8期
897-905
,共9页
杨林青%李湛%牟雁超%樊里略%李红燕%王腾蛟%雷凯
楊林青%李湛%牟雁超%樊裏略%李紅燕%王騰蛟%雷凱
양림청%리담%모안초%번리략%리홍연%왕등교%뢰개
大规模数据集%Top-k Skyline%代表性信息%并行化处理%筛选规则
大規模數據集%Top-k Skyline%代錶性信息%併行化處理%篩選規則
대규모수거집%Top-k Skyline%대표성신식%병행화처리%사선규칙
large data set%Top-k Skyline%representative information%parallel processing%filter rule
随着数据规模的日益庞大,在大规模数据集中帮助用户定位出数据量可控的代表性信息显得越发重要。虽然Top-k Skyline查询能够找到数据集中前k个最具代表性的信息,在获取代表性信息的同时又控制了结果规模,满足了上述要求,但是现有的Top-k Skyline查询在面对大规模数据集时效率较低,并不适用于大规模数据集。为了解决这个问题,将Top-k Skyline查询与并行化处理相结合,提出了一种面向大规模数据集的并行化Top-k Skyline查询算法PTKS(parallel Top-k Skyline),通过充分利用分布式资源,将原有查询进行有效的并行化处理,同时设计了基于用户偏好的用于缩减结果数据量的筛选规则,满足用户需求。在真实数据集上进行了相关实验,并与现有方法进行了对比,结果表明PTKS在大规模数据集上的查询效率更具有优势,能很好地适用于大规模数据集。
隨著數據規模的日益龐大,在大規模數據集中幫助用戶定位齣數據量可控的代錶性信息顯得越髮重要。雖然Top-k Skyline查詢能夠找到數據集中前k箇最具代錶性的信息,在穫取代錶性信息的同時又控製瞭結果規模,滿足瞭上述要求,但是現有的Top-k Skyline查詢在麵對大規模數據集時效率較低,併不適用于大規模數據集。為瞭解決這箇問題,將Top-k Skyline查詢與併行化處理相結閤,提齣瞭一種麵嚮大規模數據集的併行化Top-k Skyline查詢算法PTKS(parallel Top-k Skyline),通過充分利用分佈式資源,將原有查詢進行有效的併行化處理,同時設計瞭基于用戶偏好的用于縮減結果數據量的篩選規則,滿足用戶需求。在真實數據集上進行瞭相關實驗,併與現有方法進行瞭對比,結果錶明PTKS在大規模數據集上的查詢效率更具有優勢,能很好地適用于大規模數據集。
수착수거규모적일익방대,재대규모수거집중방조용호정위출수거량가공적대표성신식현득월발중요。수연Top-k Skyline사순능구조도수거집중전k개최구대표성적신식,재획취대표성신식적동시우공제료결과규모,만족료상술요구,단시현유적Top-k Skyline사순재면대대규모수거집시효솔교저,병불괄용우대규모수거집。위료해결저개문제,장Top-k Skyline사순여병행화처리상결합,제출료일충면향대규모수거집적병행화Top-k Skyline사순산법PTKS(parallel Top-k Skyline),통과충분이용분포식자원,장원유사순진행유효적병행화처리,동시설계료기우용호편호적용우축감결과수거량적사선규칙,만족용호수구。재진실수거집상진행료상관실험,병여현유방법진행료대비,결과표명PTKS재대규모수거집상적사순효솔경구유우세,능흔호지괄용우대규모수거집。
As data of an unprecedented scale are becoming accessible, it becomes more and more important to help user identify the representative information of a manageable size. Top-k Skyline queries can find the most k repre-sentative information, and it can also control the size of the results. So the Top-k Skyline queries meet the above-mentioned requirements. However, the traditional method of Top-k Skyline query has a low efficiency when it meets a large scale of data set. In order to speed the efficiency of the query, parallel processing comes into being.Via combining the Top-k Skyline query with the parallel processing, this paper proposes a novel algorithm of parallel Top-k Skyline queries for large scale of data set, named PTKS (parallel Top-K Skyline). With exploiting the advan-tage of parallel computing, making the efficient query by using parallel processing, and designing a filter rule based on user preferences to reduce data size, this algorithm satisfies the needs of users on some aspects. Through the experi-ments on the factual data sets, compared with the existing methods, the PTKS can be applied to the large scale of data set and is superior to the other existing algorithms in large data set.