集成技术
集成技術
집성기술
Journal of Integration Technology
2014年
4期
1-9
,共9页
大数据基准测试程序%输入数据集%程序相似性%城市交通系统%GPS轨迹数据
大數據基準測試程序%輸入數據集%程序相似性%城市交通繫統%GPS軌跡數據
대수거기준측시정서%수입수거집%정서상사성%성시교통계통%GPS궤적수거
big data benchmark%workload-input pairs%similarity%urban trafifc systems%GPS trajectory data
基准测试程序是评估计算机系统的关键测试工具。然而,大数据时代的到来使得开发大数据系统基准测试程序面临着更加严峻的挑战,当前学术界和产业界还不存在得到广泛认可的大数据基准测试程序包。文章利用实际的交通大数据系统构建了一个基于Hadoop平台的交通大数据基准测试程序包SIAT-Bench。通过选取多个层次属性量化了程序行为特征,采用聚类算法分析了不同程序-输入数据集对的相似性。根据聚类结果,为SIAT-Bench选取了有代表性的程序和输入数据集。实验结果表明,SIAT-Bench在满足程序行为多样性的同时消除了基准测试集中的冗余。
基準測試程序是評估計算機繫統的關鍵測試工具。然而,大數據時代的到來使得開髮大數據繫統基準測試程序麵臨著更加嚴峻的挑戰,噹前學術界和產業界還不存在得到廣汎認可的大數據基準測試程序包。文章利用實際的交通大數據繫統構建瞭一箇基于Hadoop平檯的交通大數據基準測試程序包SIAT-Bench。通過選取多箇層次屬性量化瞭程序行為特徵,採用聚類算法分析瞭不同程序-輸入數據集對的相似性。根據聚類結果,為SIAT-Bench選取瞭有代錶性的程序和輸入數據集。實驗結果錶明,SIAT-Bench在滿足程序行為多樣性的同時消除瞭基準測試集中的冗餘。
기준측시정서시평고계산궤계통적관건측시공구。연이,대수거시대적도래사득개발대수거계통기준측시정서면림착경가엄준적도전,당전학술계화산업계환불존재득도엄범인가적대수거기준측시정서포。문장이용실제적교통대수거계통구건료일개기우Hadoop평태적교통대수거기준측시정서포SIAT-Bench。통과선취다개층차속성양화료정서행위특정,채용취류산법분석료불동정서-수입수거집대적상사성。근거취류결과,위SIAT-Bench선취료유대표성적정서화수입수거집。실험결과표명,SIAT-Bench재만족정서행위다양성적동시소제료기준측시집중적용여。
Benchmarks are important tools to evaluate the performance of a variety of computing systems. However, benchmarks for big data systems are lacking as big data is relatively new and researchers are interested in understanding how big data systems including hardware and software work but do not have data. In this paper, an approach to develop big data benchmarks was devised at first. Then a big data benchmark suite named SIAT-Bench, which contains five representative workloads from Shenzhen urban transportation system, was presented. To this end, the program behavior was characterized and the impact of input data sets was qualiifed by observing metrics from multiple levels such as micro-architecture, OS and application layer. Then statistical techniques such as Principal Component Analysis (PCA) and Clustering were employed to perform similarity analysis between different workload-input pairs. Finally, we built SIAT-Bench by selecting representative workloads and associated input sets according to the clustering results. Experimental results show that SIAT-Bench properly satisifes the requirements of a benchmark suite.