计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2013年
17期
116-120
,共5页
海量数据%To p-K%循环阈值算法(R RTA)%顺序读取%有限内存
海量數據%To p-K%循環閾值算法(R RTA)%順序讀取%有限內存
해량수거%To p-K%순배역치산법(R RTA)%순서독취%유한내존
massive data%Top-K%Round-Robin Threshold Algorithm(RRTA)%sorted access%limited memory
To p-K查询是一种被广泛应用的操作,它根据给定的评分函数在潜在的海量数据中返回k个分值最高的元组。传统的TA算法要求能够支持随机读,NRA算法虽然放宽了对随机读的限制,但是增长阶段需要在内存中维护大量的元组,运行时将占用大量的内存资源。提出的RRTA算法相比NRA算法对数据的存储进行了重新的规划,创建一个新的表将内存上的开销转换到较廉价的外存开销,只需顺序读取就可以进行有效的To p-K查询,同时将表进行了划分,在并行处理的情况下更能提高程序的效率,能够很好地运行在内存有限的环境中。
To p-K查詢是一種被廣汎應用的操作,它根據給定的評分函數在潛在的海量數據中返迴k箇分值最高的元組。傳統的TA算法要求能夠支持隨機讀,NRA算法雖然放寬瞭對隨機讀的限製,但是增長階段需要在內存中維護大量的元組,運行時將佔用大量的內存資源。提齣的RRTA算法相比NRA算法對數據的存儲進行瞭重新的規劃,創建一箇新的錶將內存上的開銷轉換到較廉價的外存開銷,隻需順序讀取就可以進行有效的To p-K查詢,同時將錶進行瞭劃分,在併行處理的情況下更能提高程序的效率,能夠很好地運行在內存有限的環境中。
To p-K사순시일충피엄범응용적조작,타근거급정적평분함수재잠재적해량수거중반회k개분치최고적원조。전통적TA산법요구능구지지수궤독,NRA산법수연방관료대수궤독적한제,단시증장계단수요재내존중유호대량적원조,운행시장점용대량적내존자원。제출적RRTA산법상비NRA산법대수거적존저진행료중신적규화,창건일개신적표장내존상적개소전환도교렴개적외존개소,지수순서독취취가이진행유효적To p-K사순,동시장표진행료화분,재병행처리적정황하경능제고정서적효솔,능구흔호지운행재내존유한적배경중。
Top-K query is a widely used operation, which can return the highest score of the tuple in specialized massive data-bases, according to the given monotone aggregation function. Traditional TA algorithm requires the ability to support random access, NRA algorithm although relaxes the restrictions on the random access, it is found that in massive data context, NRA needs to maintain large quantity of candidate tuples in memory in the increasing phase, it will also use up a lot of memory resources in runtime. Compared with the NRA algorithm, the RRTA(Round-Robin Threshold Algorithm)which proposed in this paper replants the data storage mode, which creats a new table to switch the memory overhead to the cheaper external memory over-head, so just sorted access is also able to do efficient top-k query. Meanwhile, the table has been divided, which makes the algo-rithm more efficient and smoother even with limited memory, in the case of parallel processing.