微型机与应用
微型機與應用
미형궤여응용
MICROCOMPUTER & ITS APPLICATIONS
2014年
13期
42-44,48
,共4页
谭跃生%赵玉龙%王静宇
譚躍生%趙玉龍%王靜宇
담약생%조옥룡%왕정우
Hadoop%小文件问题%曲线拟合的最小二乘法%线性拟合
Hadoop%小文件問題%麯線擬閤的最小二乘法%線性擬閤
Hadoop%소문건문제%곡선의합적최소이승법%선성의합
Hadoop%the small file problem%least squares curve fitting%linear fitting
针对目前 Hadoop 平台不能高效处理海量小文件而出现的小文件问题,提出一种基于曲线拟合最小二乘法的确定 Hadoop 平台下何为小文件的方法。该方法首先确定小文件访问时间的量化方法,然后采用访问时间作为确立何为小文件的影响因子,通过对不同数据集大小的不同访问时间的实验,最终结合线性拟合的相关知识找到了小文件大小的量化方法。
針對目前 Hadoop 平檯不能高效處理海量小文件而齣現的小文件問題,提齣一種基于麯線擬閤最小二乘法的確定 Hadoop 平檯下何為小文件的方法。該方法首先確定小文件訪問時間的量化方法,然後採用訪問時間作為確立何為小文件的影響因子,通過對不同數據集大小的不同訪問時間的實驗,最終結閤線性擬閤的相關知識找到瞭小文件大小的量化方法。
침대목전 Hadoop 평태불능고효처리해량소문건이출현적소문건문제,제출일충기우곡선의합최소이승법적학정 Hadoop 평태하하위소문건적방법。해방법수선학정소문건방문시간적양화방법,연후채용방문시간작위학립하위소문건적영향인자,통과대불동수거집대소적불동방문시간적실험,최종결합선성의합적상관지식조도료소문건대소적양화방법。
To solve the problem of the small file which could not be handled efficiently by the present Hadoop platform. A method based on least squares curve fitting to ensure “how small is small” is proposed. First and foremost, a criteria for quantifying the access time of the small file is defined. What′s more, the small file access time is used to act as the impact factors of the problem to determine what is a small file. Finally, the means based on the relevant knowledge of linear fitting is found by the experiment of the access time of the different data sets.