成都理工大学学报(自然科学版)
成都理工大學學報(自然科學版)
성도리공대학학보(자연과학판)
JOURNAL OF CHENGDU UNIVERSITY OF TECHNOLOGY(SCIENCE & TECHNOLOGY EDITION)
2015年
1期
110-114
,共5页
数据挖掘%关联规则%压缩数据库
數據挖掘%關聯規則%壓縮數據庫
수거알굴%관련규칙%압축수거고
data mining%association rule%database compression
针对Apriori算法在面对大规模数据时效率较低的问题,提出了一种基于划分和压缩数据库的改进方法。该方法首先依据特征数据出现的频率将数据按照升序存储在临时数组中;然后将原始事务数据库分为几个互不相交的事务数据库,使得子数据库能够容纳在内存中;最后根据每个子数据库计算出的频繁项集计算整个数据库的频繁项集,从而消除了不必要的冗余数据。通过改进可以将大规模数据集进行有效的划分和压缩,对子数据库进行关联规则挖掘。实验结果表明,改进的Apriori算法在针对海量数据挖掘的执行速度和效率都有很大提高。
針對Apriori算法在麵對大規模數據時效率較低的問題,提齣瞭一種基于劃分和壓縮數據庫的改進方法。該方法首先依據特徵數據齣現的頻率將數據按照升序存儲在臨時數組中;然後將原始事務數據庫分為幾箇互不相交的事務數據庫,使得子數據庫能夠容納在內存中;最後根據每箇子數據庫計算齣的頻繁項集計算整箇數據庫的頻繁項集,從而消除瞭不必要的冗餘數據。通過改進可以將大規模數據集進行有效的劃分和壓縮,對子數據庫進行關聯規則挖掘。實驗結果錶明,改進的Apriori算法在針對海量數據挖掘的執行速度和效率都有很大提高。
침대Apriori산법재면대대규모수거시효솔교저적문제,제출료일충기우화분화압축수거고적개진방법。해방법수선의거특정수거출현적빈솔장수거안조승서존저재림시수조중;연후장원시사무수거고분위궤개호불상교적사무수거고,사득자수거고능구용납재내존중;최후근거매개자수거고계산출적빈번항집계산정개수거고적빈번항집,종이소제료불필요적용여수거。통과개진가이장대규모수거집진행유효적화분화압축,대자수거고진행관련규칙알굴。실험결과표명,개진적Apriori산법재침대해량수거알굴적집행속도화효솔도유흔대제고。
When the Apriori algorithm faces massive data,its rate is low.To counter the above problem,this paper puts forward an improved method based on the classification and database compression.Firstly,according to the appearing frequency of characteristic data,this method stores the data in a temporary array in ascending order.Then the original transaction database is divided into several disjoint transaction database in order to accommodate the daughter database in the memory.At last,the entire database frequent itemsets are calculated by the frequent itemsets calculated according to each daughter database, thereby eliminating the unnecessary redundant data. Through the improvement,the large data sets can be effectively divided and compressed,and the association rules can be tapped on the daughter database.The experimental results show that the improved Apriori algorithm has improved a lot in the speed and efficiency of mining the massive data.