计算机应用与软件
計算機應用與軟件
계산궤응용여연건
COMPUTER APPLICATIONS AND SOFTWARE
2014年
4期
297-301,326
,共6页
Apriori算法%频繁项集%项数布尔矩阵%分治
Apriori算法%頻繁項集%項數佈爾矩陣%分治
Apriori산법%빈번항집%항수포이구진%분치
Apriori algorithm%Frequent itemsets%Boolean matrix of number of terms%Divide-and-conquer
针对 Apriori 算法的不足,提出基于项数布尔矩阵的改进算法 MPIN-Apriori。改进算法运用分治思想将数据集分段处理,使用事务项数进行矩阵压缩并利用向量交运算和先验剪枝直接生成局部频繁 k-项集,最终合并为全局频繁 k-项集。该算法从根本上改进了 Apriori 算法频繁迭代的流程,避免了连接运算而且极大减轻了内存负担。实验结果表明在进行大型数据库频繁项集挖掘时其效率明显高于 Apriori 算法,而且对分布式数据挖掘有参考价值。
針對 Apriori 算法的不足,提齣基于項數佈爾矩陣的改進算法 MPIN-Apriori。改進算法運用分治思想將數據集分段處理,使用事務項數進行矩陣壓縮併利用嚮量交運算和先驗剪枝直接生成跼部頻繁 k-項集,最終閤併為全跼頻繁 k-項集。該算法從根本上改進瞭 Apriori 算法頻繁迭代的流程,避免瞭連接運算而且極大減輕瞭內存負擔。實驗結果錶明在進行大型數據庫頻繁項集挖掘時其效率明顯高于 Apriori 算法,而且對分佈式數據挖掘有參攷價值。
침대 Apriori 산법적불족,제출기우항수포이구진적개진산법 MPIN-Apriori。개진산법운용분치사상장수거집분단처리,사용사무항수진행구진압축병이용향량교운산화선험전지직접생성국부빈번 k-항집,최종합병위전국빈번 k-항집。해산법종근본상개진료 Apriori 산법빈번질대적류정,피면료련접운산이차겁대감경료내존부담。실험결과표명재진행대형수거고빈번항집알굴시기효솔명현고우 Apriori 산법,이차대분포식수거알굴유삼고개치。
We propose an improved algorithm named MPIN_Apriori which is based on the Boolean matrix of number of items aiming at the disadvantage of Apriori algorithm.The improved algorithm uses the divide-and-conquer idea to divide the dataset into segments for processing, uses number of terms of transaction to compress the matrix and utilises vector intersection operation and priori pruning to generate local frequent k-itemsets directly,and finally merges them into global frequent k-itemsets.The algorithm fundamentally improves the frequently iterative process of the Apriori algorithm,avoids the concatenation operation and greatly reduces the burden of the memory.Experimental results show that its efficiency is significantly higher than the Apriori algorithm during frequent itemsets mining on a large database,and it also has the reference value for distributed data mining.