计算机研究与发展
計算機研究與髮展
계산궤연구여발전
JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT
2015年
7期
1522-1530
,共9页
周恩强%张伟%卢宇彤%侯红军%董勇
週恩彊%張偉%盧宇彤%侯紅軍%董勇
주은강%장위%로우동%후홍군%동용
数据密集计算%缓存%本地存储%共享存储%地震数据处理
數據密集計算%緩存%本地存儲%共享存儲%地震數據處理
수거밀집계산%완존%본지존저%공향존저%지진수거처리
data-intensive computing%cache%local storage%shared storage%seismic data processing
随着高性能计算机逐步应用在大规模数据处理领域,存储系统将成为制约数据处理效率的主要瓶颈。在分析了影响数据密集型计算 I/O 性能若干关键因素的基础上,提出使用计算结点本地存储构建协作式非易失缓存、以分布式存储架构加速集中式存储架构的方法。该方法基于应用层协同使用分布化的本地存储资源,使用非易失存储介质构成大缓存空间,存放大规模数据分析的中间过程结果,以此实现高缓存命中率,并利用并发度约束控制等手段避免 I/O 竞争,充分利用本地存储的特定性能优势保证缓存加速效果,从而有效地提高了大规模数据处理过程的 I/O 效率。基于多平台多种 I/O 模式的测试结果证实了该方法的有效性,聚合 I/O 带宽具有高扩展性,典型数据密集应用的整体性能最大可提升6倍。
隨著高性能計算機逐步應用在大規模數據處理領域,存儲繫統將成為製約數據處理效率的主要瓶頸。在分析瞭影響數據密集型計算 I/O 性能若榦關鍵因素的基礎上,提齣使用計算結點本地存儲構建協作式非易失緩存、以分佈式存儲架構加速集中式存儲架構的方法。該方法基于應用層協同使用分佈化的本地存儲資源,使用非易失存儲介質構成大緩存空間,存放大規模數據分析的中間過程結果,以此實現高緩存命中率,併利用併髮度約束控製等手段避免 I/O 競爭,充分利用本地存儲的特定性能優勢保證緩存加速效果,從而有效地提高瞭大規模數據處理過程的 I/O 效率。基于多平檯多種 I/O 模式的測試結果證實瞭該方法的有效性,聚閤 I/O 帶寬具有高擴展性,典型數據密集應用的整體性能最大可提升6倍。
수착고성능계산궤축보응용재대규모수거처리영역,존저계통장성위제약수거처리효솔적주요병경。재분석료영향수거밀집형계산 I/O 성능약간관건인소적기출상,제출사용계산결점본지존저구건협작식비역실완존、이분포식존저가구가속집중식존저가구적방법。해방법기우응용층협동사용분포화적본지존저자원,사용비역실존저개질구성대완존공간,존방대규모수거분석적중간과정결과,이차실현고완존명중솔,병이용병발도약속공제등수단피면 I/O 경쟁,충분이용본지존저적특정성능우세보증완존가속효과,종이유효지제고료대규모수거처리과정적 I/O 효솔。기우다평태다충 I/O 모식적측시결과증실료해방법적유효성,취합 I/O 대관구유고확전성,전형수거밀집응용적정체성능최대가제승6배。
With HPC systems widely used in today’s modern science computing ,more data‐intensive applications are generating and analyzing the increasing scale of datasets ,which makes HPC storage system facing new challenges .By comparing the different storage architectures with the corresponding approaches of file system ,a novel cache approach , named DDCache ,is proposed to improve the efficiency of data‐intensive computing . DDCache leverages the distributed storage architecture as performance booster for centralized storage architecture by fully exploiting the potential benefits of node‐local storage distributed across the system .In order to supply much larger cache volume than volatile memory cache ,DDCache aggregates the node‐local disks as huge non‐volatile cooperative cache .Then high cache hit ratio is achieved through keeping intermediate data in the DDCache as long as possible during overall process of applications .To make the node‐local storage efficient enough to act as data cache ,locality aware data layout is used to make cached data close to compute tasks and evenly distributed .Furthermore ,concurrency control is introduced to throttle I/O requests flowing into or out of DDCache and regain the special advantage of node‐local storage .Evaluations on the typical HPC platforms verify the effectiveness of DDCache .Scalable I/O bandwidth is achieved on the well‐known HPC scenario of checkpoint/restart and the overall performance of typical data‐intensive application is improved up to 6 times .