信息技术与信息化
信息技術與信息化
신식기술여신식화
INFORMATION TECHNOLOGY & INFORMATIZATION
2013年
6期
125-128
,共4页
数据挖掘%频繁项列表L%FP-growth%散列表
數據挖掘%頻繁項列錶L%FP-growth%散列錶
수거알굴%빈번항렬표L%FP-growth%산렬표
Data mining%Frequent item table L%FP-growth%Hash table
FP-growth算法是关联规则挖掘中效率较高的算法,以自底向上方式探索树,由FP树产生频繁项集。本文针对FP树构造过程中需多次遍历频繁项列表L的缺点,提出了一种基于散列表的改进算法,实现了项名称关键字到存储地址的映射,进而实现了项名称关键字到其支持度计数的映射。在查找某项的支持度计数时,只需给出其名称关键字,无需从头遍历频繁项列表L,时间复杂度由O(n)提高到O(1)。实验结果表明,改进算法的性能优于原算法,节省了遍历时间,提高了挖掘效率。
FP-growth算法是關聯規則挖掘中效率較高的算法,以自底嚮上方式探索樹,由FP樹產生頻繁項集。本文針對FP樹構造過程中需多次遍歷頻繁項列錶L的缺點,提齣瞭一種基于散列錶的改進算法,實現瞭項名稱關鍵字到存儲地阯的映射,進而實現瞭項名稱關鍵字到其支持度計數的映射。在查找某項的支持度計數時,隻需給齣其名稱關鍵字,無需從頭遍歷頻繁項列錶L,時間複雜度由O(n)提高到O(1)。實驗結果錶明,改進算法的性能優于原算法,節省瞭遍歷時間,提高瞭挖掘效率。
FP-growth산법시관련규칙알굴중효솔교고적산법,이자저향상방식탐색수,유FP수산생빈번항집。본문침대FP수구조과정중수다차편력빈번항렬표L적결점,제출료일충기우산렬표적개진산법,실현료항명칭관건자도존저지지적영사,진이실현료항명칭관건자도기지지도계수적영사。재사조모항적지지도계수시,지수급출기명칭관건자,무수종두편력빈번항렬표L,시간복잡도유O(n)제고도O(1)。실험결과표명,개진산법적성능우우원산법,절성료편력시간,제고료알굴효솔。
FP-growth is one of the most efifcient algorithm among all the association rule algorithms.It is a kind of algorithms that explores the FP-tree by a bottom-up way,then it generates frequent items by mining the FP-tree. This article puts forward an optimizing algorithm which is based on hash table against the defects during the process of FP-tree construction,because it usually traverses the frequent item table L time and time again.The new algorithm has achieved a mapping of a key name to the storage address,thus it also achieved a mapping of a key name to its supporting number.As a result, just give an item-key or item-name when you want to search the supporting number of an item. The hash function will help you to calculate the logical address according to the item-key you provided,you will obtain the supporting number directlly according to the logical address.There is no point at all in traversing the frequent item table L.Obviously,time complexity of searching one supporting number of an item improves from O(n) to O(1).At last,experimental results show that optimizing algorithm is indeed better than the original algorithm in terms of the running time.It spends less time than the original one.It saves traversal time and improvs mining efifciency.