CAJ | 학술논문

随着数据规模的日益庞大，在大规模数据集中帮助用户定位出数据量可控的代表性信息显得越发重要。虽然Top-k Skyline查询能够找到数据集中前k个最具代表性的信息，在获取代表性信息的同时又控制了结果规模，满足了上述要求，但是现有的Top-k Skyline查询在面对大规模数据集时效率较低，并不适用于大规模数据集。为了解决这个问题，将Top-k Skyline查询与并行化处理相结合，提出了一种面向大规模数据集的并行化Top-k Skyline查询算法PTKS（parallel Top-k Skyline），通过充分利用分布式资源，将原有查询进行有效的并行化处理，同时设计了基于用户偏好的用于缩减结果数据量的筛选规则，满足用户需求。在真实数据集上进行了相关实验，并与现有方法进行了对比，结果表明PTKS在大规模数据集上的查询效率更具有优势，能很好地适用于大规模数据集。
수착수거규모적일익방대，재대규모수거집중방조용호정위출수거량가공적대표성신식현득월발중요。수연Top-k Skyline사순능구조도수거집중전k개최구대표성적신식，재획취대표성신식적동시우공제료결과규모，만족료상술요구，단시현유적Top-k Skyline사순재면대대규모수거집시효솔교저，병불괄용우대규모수거집。위료해결저개문제，장Top-k Skyline사순여병행화처리상결합，제출료일충면향대규모수거집적병행화Top-k Skyline사순산법PTKS（parallel Top-k Skyline），통과충분이용분포식자원，장원유사순진행유효적병행화처리，동시설계료기우용호편호적용우축감결과수거량적사선규칙，만족용호수구。재진실수거집상진행료상관실험，병여현유방법진행료대비，결과표명PTKS재대규모수거집상적사순효솔경구유우세，능흔호지괄용우대규모수거집。
As data of an unprecedented scale are becoming accessible, it becomes more and more important to help user identify the representative information of a manageable size. Top-k Skyline queries can find the most k repre-sentative information, and it can also control the size of the results. So the Top-k Skyline queries meet the above-mentioned requirements. However, the traditional method of Top-k Skyline query has a low efficiency when it meets a large scale of data set. In order to speed the efficiency of the query, parallel processing comes into being.Via combining the Top-k Skyline query with the parallel processing, this paper proposes a novel algorithm of parallel Top-k Skyline queries for large scale of data set, named PTKS (parallel Top-K Skyline). With exploiting the advan-tage of parallel computing, making the efficient query by using parallel processing, and designing a filter rule based on user preferences to reduce data size, this algorithm satisfies the needs of users on some aspects. Through the experi-ments on the factual data sets, compared with the existing methods, the PTKS can be applied to the large scale of data set and is superior to the other existing algorithms in large data set.