计算机应用
計算機應用
계산궤응용
COMPUTER APPLICATION
2014年
z1期
100-102,106
,共4页
数据挖掘%频繁项集%MapReduce%SON算法%Hadoop
數據挖掘%頻繁項集%MapReduce%SON算法%Hadoop
수거알굴%빈번항집%MapReduce%SON산법%Hadoop
data mining%frequent itemset%MapReduce%SON algorithm%Hadoop
在挖掘频繁项集的算法中,SON算法能够有效地降低CPU和I/O负载,但是SON算法在单节点上运行时仍然受限于内存和CPU;并且随着海量数据的来临,单节点也无法满足数据的存储。在深入研究SON算法的基础之上,提出了MapReduce编程模型实现SON算法的方法。算法的执行需要两轮MapReduce迭代,第一轮迭代求出局部频繁项集,第二轮迭代求出全局频繁项集。实验结果表明:SON算法采用MapReduce编程模型并行化后,部署在Hadoop集群上运行,随着分区数目的增加能够获取较好的加速比。
在挖掘頻繁項集的算法中,SON算法能夠有效地降低CPU和I/O負載,但是SON算法在單節點上運行時仍然受限于內存和CPU;併且隨著海量數據的來臨,單節點也無法滿足數據的存儲。在深入研究SON算法的基礎之上,提齣瞭MapReduce編程模型實現SON算法的方法。算法的執行需要兩輪MapReduce迭代,第一輪迭代求齣跼部頻繁項集,第二輪迭代求齣全跼頻繁項集。實驗結果錶明:SON算法採用MapReduce編程模型併行化後,部署在Hadoop集群上運行,隨著分區數目的增加能夠穫取較好的加速比。
재알굴빈번항집적산법중,SON산법능구유효지강저CPU화I/O부재,단시SON산법재단절점상운행시잉연수한우내존화CPU;병차수착해량수거적래림,단절점야무법만족수거적존저。재심입연구SON산법적기출지상,제출료MapReduce편정모형실현SON산법적방법。산법적집행수요량륜MapReduce질대,제일륜질대구출국부빈번항집,제이륜질대구출전국빈번항집。실험결과표명:SON산법채용MapReduce편정모형병행화후,부서재Hadoop집군상운행,수착분구수목적증가능구획취교호적가속비。
In algorithms for mining frequent itemsets, SON algorithm can effectively reduce CPU and I/O overhead. But SON algorithm which runs on a single node is still limited by the memory and CPU of the node. And with the advent of mass data, single node cannot meet the requirements of data storage. Based on the research of SON algorithm, a SON algorithm paralleled by MapReduce paradigm was proposed. Execution of the algorithm required two MapReduce iterations. The first round gave the local frequent itemsets, and the second round worked out the global frequent itemsets. The experimental results show that SON algorithm paralleled by MapReduce paradigm and running on the Hadoop cluster can obtain liner speedup with the growing number of partitions.