CAJ | 학술논문

传统的频繁项集挖掘方法具有一定的局限性。Apriori 算法需要重复扫描输入数据，导致很高的 I ／O 负载，算法性能不高；Fp-growth 算法需要在内存中建立 Fp-tree 并根据 Fp-tree 挖掘频繁项集，导致算法受到计算机的内存限制。在大数据时代，由于挖掘数据规模十分巨大，更加凸显这些传统算法的局限性。对此，一方面改进传统的频繁项集挖掘算法，另一方面基于 Spark 框架实现分布式频繁项集挖掘算法（FIMBS）。实验结果表明，该算法相比基于 MapReduce 框架的关联规则算法具有显著的优势。
전통적빈번항집알굴방법구유일정적국한성。Apriori 산법수요중복소묘수입수거，도치흔고적 I ／O 부재，산법성능불고；Fp-growth 산법수요재내존중건립 Fp-tree 병근거 Fp-tree 알굴빈번항집，도치산법수도계산궤적내존한제。재대수거시대，유우알굴수거규모십분거대，경가철현저사전통산법적국한성。대차，일방면개진전통적빈번항집알굴산법，령일방면기우 Spark 광가실현분포식빈번항집알굴산법（FIMBS）。실험결과표명，해산법상비기우 MapReduce 광가적관련규칙산법구유현저적우세。
Traditional frequent itemset mining algorithms have certain limitations.For example,Apriori algorithm has to scan the input data repeatedly,which leads to high I /O load and low performance,while FP-Growth algorithm is limited by the capacity of computer’s memory because it needs to build an FP-tree in the memory and to mine frequent itemset according to FP-tree.In big data era,these limitations become more prominent,for the scale of data mining is so large.In this paper we improve the traditional frequent itemset mining algorithms on one hand,and on the other hand based on Spark framework we implement the distributed frequent itemset mining algorithm (FIMBS).Experimental results show that the FIMBS have apparent advantages over the association rules algorithm based on MapReduce framework.