计算机应用
計算機應用
계산궤응용
COMPUTER APPLICATION
2015年
z1期
69-73
,共5页
毕方明%陈伟%杨魁%车奔
畢方明%陳偉%楊魁%車奔
필방명%진위%양괴%차분
分布式%数据存储%数据划分%顺序读取%内存有限
分佈式%數據存儲%數據劃分%順序讀取%內存有限
분포식%수거존저%수거화분%순서독취%내존유한
distributed%data storage%data partitioning%sorted access%limited memory
top-k查询是一种被广泛应用的操作,通过把已有top-k算法作为分析和研究的基础,根据现有算法所存在的不足提出自己的解决方案。提出SRTA( Sequential-Read Threshold Algorithm),相比NRA算法对数据的存储进行了重新的规划,创建一个新的表将内存上的开销转换到较廉价的外存开销,只需顺序读取就可以进行有效的top-k查询,同时将表进行了划分,在并行处理的情况下更能提高程序的效率,能够很好地运行在内存有限的环境中。在SRTA基础上提出的DSRTA(Distributed Sequential-Read Threshold Algorithm),适用于分布式环境中。 DSRTA先采用ID划分的方式把原有数据集划分为多个子空间,然后再进行数据规划,发挥分布式的性能优势,进一步提高了SRTA的查询效率。
top-k查詢是一種被廣汎應用的操作,通過把已有top-k算法作為分析和研究的基礎,根據現有算法所存在的不足提齣自己的解決方案。提齣SRTA( Sequential-Read Threshold Algorithm),相比NRA算法對數據的存儲進行瞭重新的規劃,創建一箇新的錶將內存上的開銷轉換到較廉價的外存開銷,隻需順序讀取就可以進行有效的top-k查詢,同時將錶進行瞭劃分,在併行處理的情況下更能提高程序的效率,能夠很好地運行在內存有限的環境中。在SRTA基礎上提齣的DSRTA(Distributed Sequential-Read Threshold Algorithm),適用于分佈式環境中。 DSRTA先採用ID劃分的方式把原有數據集劃分為多箇子空間,然後再進行數據規劃,髮揮分佈式的性能優勢,進一步提高瞭SRTA的查詢效率。
top-k사순시일충피엄범응용적조작,통과파이유top-k산법작위분석화연구적기출,근거현유산법소존재적불족제출자기적해결방안。제출SRTA( Sequential-Read Threshold Algorithm),상비NRA산법대수거적존저진행료중신적규화,창건일개신적표장내존상적개소전환도교렴개적외존개소,지수순서독취취가이진행유효적top-k사순,동시장표진행료화분,재병행처리적정황하경능제고정서적효솔,능구흔호지운행재내존유한적배경중。재SRTA기출상제출적DSRTA(Distributed Sequential-Read Threshold Algorithm),괄용우분포식배경중。 DSRTA선채용ID화분적방식파원유수거집화분위다개자공간,연후재진행수거규화,발휘분포식적성능우세,진일보제고료SRTA적사순효솔。
Top-k query is a widely used operation. This paper took the existing algorithms as the basis of analysis and research, and put forward solutions to solving the problems of the existing algorithms. Compared with the NRA ( No Random Access) algorithm, the SRTA ( Sequential-Read Threshold Algorithm) which proposed in this paper replanted the data storage mode, which created a new table to switch the memory overhead to the cheaper external memory overhead, so just sorted access was also able to do efficient top-k query. Meanwhile, the table was divided, which made the algorithm more efficient and smoother even with limited memory, in the case of parallel processing. DSRTA ( Distributed SRTA) algorithm applies to the distributed environment, which is designed on the basis of SRTA. The original data set was divided into more than one spaces in the way of ID division by DSRTA, and then replanted the data storage mode. By taking advantages of the distributed system performance, the query efficiency of SRTA was further improved.