西安文理学院学报(自然科学版)
西安文理學院學報(自然科學版)
서안문이학원학보(자연과학판)
Journal of Xi'an University of Arts and Science (Natural Science Edition)
2015年
4期
26-30
,共5页
罗芳%阮群生%李志亮%曾思南
囉芳%阮群生%李誌亮%曾思南
라방%원군생%리지량%증사남
关联规则%MapReduce%压缩矩阵%Apriori
關聯規則%MapReduce%壓縮矩陣%Apriori
관련규칙%MapReduce%압축구진%Apriori
association rules%MapReduce%compression matrix%Apriori
针对经典的 Apriori 算法需要多次扫描数据库,不适合大规模数据这个问题,提出了一种改进的 Apriori 算法。该算法采用布尔向量关系运算思想,将事务数据库扫描后转化成压缩矩阵,在 MapRe-duce 框架下将压缩矩阵进行分块,每块分别被做并列式处理。利用分压缩矩阵快速计算所有的候选项集,从中产生频繁 K -项集,降低了 Apriori 算法的时间复杂度。
針對經典的 Apriori 算法需要多次掃描數據庫,不適閤大規模數據這箇問題,提齣瞭一種改進的 Apriori 算法。該算法採用佈爾嚮量關繫運算思想,將事務數據庫掃描後轉化成壓縮矩陣,在 MapRe-duce 框架下將壓縮矩陣進行分塊,每塊分彆被做併列式處理。利用分壓縮矩陣快速計算所有的候選項集,從中產生頻繁 K -項集,降低瞭 Apriori 算法的時間複雜度。
침대경전적 Apriori 산법수요다차소묘수거고,불괄합대규모수거저개문제,제출료일충개진적 Apriori 산법。해산법채용포이향량관계운산사상,장사무수거고소묘후전화성압축구진,재 MapRe-duce 광가하장압축구진진행분괴,매괴분별피주병렬식처리。이용분압축구진쾌속계산소유적후선항집,종중산생빈번 K -항집,강저료 Apriori 산법적시간복잡도。
In view of the problem of the classic Apriori algorithm need to scan the database re-peatedly and it is not suitable for large-scale data, in this paper, an improved Apriori algorithm was proposed, which used the relationship operation of the Boolean vector, and transformed the transaction database after scanning into a compression matrix. Under the MapReduce frame-work, the compression matrix was divided into blocks for distributed processing. Sub-com-pression matrix was used to do fast calculation for all candidate sets, and the frequent K sets had been generated from all of above, finally, the time complexity of Apriori algorithm was reduced.