计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2014年
3期
67-70,92
,共5页
数据挖掘%数据库%云计算%并发控制%频繁子树%增量更新
數據挖掘%數據庫%雲計算%併髮控製%頻繁子樹%增量更新
수거알굴%수거고%운계산%병발공제%빈번자수%증량경신
data mining%database%cloud computing%concurrency control%frequent subtree%incremental updating
为适应真实环境中数据量大、流程复杂、计算密集的数据挖掘需求,提高传统树增量更新挖掘效率,改变已有算法的串行执行方式,提出一种基于Hadoop的动态树增量更新方法。介绍云计算、模型与执行流程等基本概念,针对现有Hadoop平台中任务调度的随机分配策略,设计一种动态云平台中的资源调度与分配算法,以期达到成本消耗的最小化,给出树增量更新挖掘算法以及2个并行算法(DeleteFreqTree和 FindNewTree),完成树数据的增量挖掘工作。实验结果表明,该并行算法有效可行,具有高效性与良好的扩展率,能够对海量树数据进行更新挖掘。
為適應真實環境中數據量大、流程複雜、計算密集的數據挖掘需求,提高傳統樹增量更新挖掘效率,改變已有算法的串行執行方式,提齣一種基于Hadoop的動態樹增量更新方法。介紹雲計算、模型與執行流程等基本概唸,針對現有Hadoop平檯中任務調度的隨機分配策略,設計一種動態雲平檯中的資源調度與分配算法,以期達到成本消耗的最小化,給齣樹增量更新挖掘算法以及2箇併行算法(DeleteFreqTree和 FindNewTree),完成樹數據的增量挖掘工作。實驗結果錶明,該併行算法有效可行,具有高效性與良好的擴展率,能夠對海量樹數據進行更新挖掘。
위괄응진실배경중수거량대、류정복잡、계산밀집적수거알굴수구,제고전통수증량경신알굴효솔,개변이유산법적천행집행방식,제출일충기우Hadoop적동태수증량경신방법。개소운계산、모형여집행류정등기본개념,침대현유Hadoop평태중임무조도적수궤분배책략,설계일충동태운평태중적자원조도여분배산법,이기체도성본소모적최소화,급출수증량경신알굴산법이급2개병행산법(DeleteFreqTree화 FindNewTree),완성수수거적증량알굴공작。실험결과표명,해병행산법유효가행,구유고효성여량호적확전솔,능구대해량수수거진행경신알굴。
In order to deal with problems in true environment caused by data mining tasks with larger amount of data, complex processing and intensive computing, improve the traditional tree incremental updating mining efficiency, and change the existing algorithm of serial implementation methods, this paper proposes a dynamic tree incremental updating method on the basis of Hadoop. It introduces concepts concerning cloud computing, the cloud model, operating process and so on. Then, according to the Hadoop platform task scheduling random distribution strategy, a new dynamic cloud platform resource allocation algorithm is put forward in order to minimize the consumption cost. It designs a new tree incremental updating algorithm on the basis of cloud platform, and two parallel algorithms (DeleteFreqTree, FindNewTree) are proposed. Large number of experiments show that the paralleled algorithm is feasible, highly efficient, expandable, and the algorithm can mine mass tree data effectively.