CAJ | 학술논문

top-k查询是一种被广泛应用的操作，通过把已有top-k算法作为分析和研究的基础，根据现有算法所存在的不足提出自己的解决方案。提出SRTA( Sequential-Read Threshold Algorithm)，相比NRA算法对数据的存储进行了重新的规划，创建一个新的表将内存上的开销转换到较廉价的外存开销，只需顺序读取就可以进行有效的top-k查询，同时将表进行了划分，在并行处理的情况下更能提高程序的效率，能够很好地运行在内存有限的环境中。在SRTA基础上提出的DSRTA(Distributed Sequential-Read Threshold Algorithm)，适用于分布式环境中。 DSRTA先采用ID划分的方式把原有数据集划分为多个子空间，然后再进行数据规划，发挥分布式的性能优势，进一步提高了SRTA的查询效率。
top-k사순시일충피엄범응용적조작，통과파이유top-k산법작위분석화연구적기출，근거현유산법소존재적불족제출자기적해결방안。제출SRTA( Sequential-Read Threshold Algorithm)，상비NRA산법대수거적존저진행료중신적규화，창건일개신적표장내존상적개소전환도교렴개적외존개소，지수순서독취취가이진행유효적top-k사순，동시장표진행료화분，재병행처리적정황하경능제고정서적효솔，능구흔호지운행재내존유한적배경중。재SRTA기출상제출적DSRTA(Distributed Sequential-Read Threshold Algorithm)，괄용우분포식배경중。 DSRTA선채용ID화분적방식파원유수거집화분위다개자공간，연후재진행수거규화，발휘분포식적성능우세，진일보제고료SRTA적사순효솔。
Top-k query is a widely used operation. This paper took the existing algorithms as the basis of analysis and research, and put forward solutions to solving the problems of the existing algorithms. Compared with the NRA ( No Random Access) algorithm, the SRTA ( Sequential-Read Threshold Algorithm) which proposed in this paper replanted the data storage mode, which created a new table to switch the memory overhead to the cheaper external memory overhead, so just sorted access was also able to do efficient top-k query. Meanwhile, the table was divided, which made the algorithm more efficient and smoother even with limited memory, in the case of parallel processing. DSRTA ( Distributed SRTA) algorithm applies to the distributed environment, which is designed on the basis of SRTA. The original data set was divided into more than one spaces in the way of ID division by DSRTA, and then replanted the data storage mode. By taking advantages of the distributed system performance, the query efficiency of SRTA was further improved.