新型工业化
新型工業化
신형공업화
New Industrialization Straregy
2011年
4期
35-44
,共10页
毛建旭%毛建频%姚晓玲%刘彩苹
毛建旭%毛建頻%姚曉玲%劉綵蘋
모건욱%모건빈%요효령%류채평
电路与系统%数据挖掘%FP-树%频繁模式%等价类
電路與繫統%數據挖掘%FP-樹%頻繁模式%等價類
전로여계통%수거알굴%FP-수%빈번모식%등개류
Circuit and System%Data mining%FP-tree%Frequent itemset%Equivalent classes
挖掘频繁项集是数据挖掘中最基本的问题之一,而大型数据库庞大的数据使得传统的频繁模式挖掘算法难以适用。针对大型数据库的特点,在分析FP-growth算法的基础上,提出一种基于等价类的大型数据库频繁模式挖掘算法EFP-growth(Equivalent Classes Frequent Patterns-Growth)算法。EFP-growth算法利用项集等价类将关联规则挖掘的项集分成互不相交的子空间的性质,将一个大型数据库分解成多个投影数据库,依次在每一个投影数据库上进行约束频繁项集挖掘。算法尤其适合支持度较小时的大型数据库的挖掘。分析和实验表明EFP-growth算法在挖掘大型数据库时时间和空间的性能上均优于FP-growth算法。而且,随着数据库规模的增大,EFP-growth算法具有更明显的优势。
挖掘頻繁項集是數據挖掘中最基本的問題之一,而大型數據庫龐大的數據使得傳統的頻繁模式挖掘算法難以適用。針對大型數據庫的特點,在分析FP-growth算法的基礎上,提齣一種基于等價類的大型數據庫頻繁模式挖掘算法EFP-growth(Equivalent Classes Frequent Patterns-Growth)算法。EFP-growth算法利用項集等價類將關聯規則挖掘的項集分成互不相交的子空間的性質,將一箇大型數據庫分解成多箇投影數據庫,依次在每一箇投影數據庫上進行約束頻繁項集挖掘。算法尤其適閤支持度較小時的大型數據庫的挖掘。分析和實驗錶明EFP-growth算法在挖掘大型數據庫時時間和空間的性能上均優于FP-growth算法。而且,隨著數據庫規模的增大,EFP-growth算法具有更明顯的優勢。
알굴빈번항집시수거알굴중최기본적문제지일,이대형수거고방대적수거사득전통적빈번모식알굴산법난이괄용。침대대형수거고적특점,재분석FP-growth산법적기출상,제출일충기우등개류적대형수거고빈번모식알굴산법EFP-growth(Equivalent Classes Frequent Patterns-Growth)산법。EFP-growth산법이용항집등개류장관련규칙알굴적항집분성호불상교적자공간적성질,장일개대형수거고분해성다개투영수거고,의차재매일개투영수거고상진행약속빈번항집알굴。산법우기괄합지지도교소시적대형수거고적알굴。분석화실험표명EFP-growth산법재알굴대형수거고시시간화공간적성능상균우우FP-growth산법。이차,수착수거고규모적증대,EFP-growth산법구유경명현적우세。
Finding frequent itemsets is one of the most basic problems in data mining. The large amounts of data make the traditional algorithms for frequent patterns mining difficult to extend to large databases. According to characteristic of large databases, inspired by the fact that the FP-growth provides an effective algorithm, a new EFP-growth for mining frequent patterns in large databases is proposed. Based on the characteristic of equivalent classes , which separate item sets of association rules into many subsets , proposed algorithm divides a large database into many projection subsets and carries out constrained frequent. Experiments show that the algorithm has accelerated the mining speed and the performance of space scalability is superior to the FP-growth algorithm. Moreover, the algorithm has a very good time and space scalability with the increasing size of database.