应用气象学报
應用氣象學報
응용기상학보
QUARTERLY JOURNAL OF APPLIED METEOROLOGY
2014年
5期
618-628
,共11页
杨润芝%沈文海%肖卫青%胡开喜%杨昕%王颖%田伟
楊潤芝%瀋文海%肖衛青%鬍開喜%楊昕%王穎%田偉
양윤지%침문해%초위청%호개희%양흔%왕영%전위
MapReduce%云计算%Hadoop%历史资料整编
MapReduce%雲計算%Hadoop%歷史資料整編
MapReduce%운계산%Hadoop%역사자료정편
MapReduce%cloud computing%Hadoop%meteorological data processing
云计算技术使用分布式的计算技术实现了并行计算的计算能力和计算效率,解决了单机服务器计算能力低的问题。基于长序列历史资料所计算得出的气候标准值对于气象领域实时业务、准实时业务及科学研究中均具有重要的意义。由于长序列历史资料数据量大、运算逻辑较复杂,在传统单节点计算平台上进行整编计算耗时非常长。该文基于 Hadoop 分布式计算框架搭建了集群模式的云计算平台,以长序列历史资料作为源数据,基于 MapReduce计算模型实现了部分整编算法,提高计算时效。同时,由于数据源本身具有文件个数多、单个文件小等特点,对数据源存储形式及数据文件大小进行改造,分别利用 SequenceFile 方式及文本文件合并方式对同一种场景进行计算时效对比测试,分别测试了10个文件合并、100个文件合并两种情况,使时效性得到了更大程度的提升。
雲計算技術使用分佈式的計算技術實現瞭併行計算的計算能力和計算效率,解決瞭單機服務器計算能力低的問題。基于長序列歷史資料所計算得齣的氣候標準值對于氣象領域實時業務、準實時業務及科學研究中均具有重要的意義。由于長序列歷史資料數據量大、運算邏輯較複雜,在傳統單節點計算平檯上進行整編計算耗時非常長。該文基于 Hadoop 分佈式計算框架搭建瞭集群模式的雲計算平檯,以長序列歷史資料作為源數據,基于 MapReduce計算模型實現瞭部分整編算法,提高計算時效。同時,由于數據源本身具有文件箇數多、單箇文件小等特點,對數據源存儲形式及數據文件大小進行改造,分彆利用 SequenceFile 方式及文本文件閤併方式對同一種場景進行計算時效對比測試,分彆測試瞭10箇文件閤併、100箇文件閤併兩種情況,使時效性得到瞭更大程度的提升。
운계산기술사용분포식적계산기술실현료병행계산적계산능력화계산효솔,해결료단궤복무기계산능력저적문제。기우장서렬역사자료소계산득출적기후표준치대우기상영역실시업무、준실시업무급과학연구중균구유중요적의의。유우장서렬역사자료수거량대、운산라집교복잡,재전통단절점계산평태상진행정편계산모시비상장。해문기우 Hadoop 분포식계산광가탑건료집군모식적운계산평태,이장서렬역사자료작위원수거,기우 MapReduce계산모형실현료부분정편산법,제고계산시효。동시,유우수거원본신구유문건개수다、단개문건소등특점,대수거원존저형식급수거문건대소진행개조,분별이용 SequenceFile 방식급문본문건합병방식대동일충장경진행계산시효대비측시,분별측시료10개문건합병、100개문건합병량충정황,사시효성득도료경대정도적제승。
Cloud computing technologies,which solves the problem of low computing power of a standalone server,uses distributed computing technology to achieve the computing power of parallel computing and computational efficiency.Cloud computing is a new application model for decentralized computing which can provide reliable,customized and maximum number of users with minimum resource,and it is also an important way to carry out cloud computing theory research and practical application combining with other theory and good techniques.In many industries and fields,cloud computing has a wider range of applica-tions,and its flexibility,ease of use,stability is gradually affirmed.In meteorological department,cloud-based platform for the development of scientific computing is still very limited,but some attempts are im-plemented with the maturation of cloud computing. <br> In meteorological operations,such as large-scale scientific computing and other general computing model are run on high-performance server clusters.Due to limitations of resources and the number of HPC nodes,scientific computing still relies on traditional standalone or clustered mode.Therefore,an internal exploration and conventional general-purpose computing and cloud computing platform is very meaningful for the meteorological department.60-year valuable and precious long sequence of historical data are stored in National Meteorological Information Center for the use of real-time,near-real-time business and re-search.Processing these historical data is time-consuming,therefore some new methods are implemented. Based on Hadoop cloud computing platform,a cluster mode is built and a variety of statistical methods are adopted using MapReduce computation model.The storage format of the source data is adjusted with Se-quenceFile which is composed of <Key,Value> serialization,by this mean multiple files of Format-A are merged to a large SequenceFile to test computational efficiency changes.Meanwhile,many small files are merged to a larger file.Configurations are modified experimentally for the Hadoop cluster environment, and different number of task nodes are used to record different computational efficiency.