计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2014年
3期
51-54
,共4页
不确定性数据%频繁闭项集%数据挖掘%水平挖掘%置信度概率
不確定性數據%頻繁閉項集%數據挖掘%水平挖掘%置信度概率
불학정성수거%빈번폐항집%수거알굴%수평알굴%치신도개솔
uncertain data%frequent closed itemsets%data mining%level mining%probability of confidence
对于不确定性数据,传统判断项集是否频繁的方法并不能准确表达项集的频繁性,同样对于大型数据,频繁项集显得庞大和冗余。针对上述不足,在水平挖掘算法 Apriori 的基础上,提出一种基于不确定性数据的频繁闭项集挖掘算法 UFCIM。利用置信度概率表达项集频繁的准确性,置信度越高,项集为频繁的准确性也越高,且由于频繁闭项集是频繁项集的一种无损压缩表示,因此利用压缩形式的频繁闭项集替代庞大的频繁项集。实验结果表明,该算法能够快速地挖掘出不确定性数据中的频繁闭项集,在减少项集冗余的同时保证项集的准确性和完整性。
對于不確定性數據,傳統判斷項集是否頻繁的方法併不能準確錶達項集的頻繁性,同樣對于大型數據,頻繁項集顯得龐大和冗餘。針對上述不足,在水平挖掘算法 Apriori 的基礎上,提齣一種基于不確定性數據的頻繁閉項集挖掘算法 UFCIM。利用置信度概率錶達項集頻繁的準確性,置信度越高,項集為頻繁的準確性也越高,且由于頻繁閉項集是頻繁項集的一種無損壓縮錶示,因此利用壓縮形式的頻繁閉項集替代龐大的頻繁項集。實驗結果錶明,該算法能夠快速地挖掘齣不確定性數據中的頻繁閉項集,在減少項集冗餘的同時保證項集的準確性和完整性。
대우불학정성수거,전통판단항집시부빈번적방법병불능준학표체항집적빈번성,동양대우대형수거,빈번항집현득방대화용여。침대상술불족,재수평알굴산법 Apriori 적기출상,제출일충기우불학정성수거적빈번폐항집알굴산법 UFCIM。이용치신도개솔표체항집빈번적준학성,치신도월고,항집위빈번적준학성야월고,차유우빈번폐항집시빈번항집적일충무손압축표시,인차이용압축형식적빈번폐항집체대방대적빈번항집。실험결과표명,해산법능구쾌속지알굴출불학정성수거중적빈번폐항집,재감소항집용여적동시보증항집적준학성화완정성。
For the uncertain data, traditional method of judging whether an itemset is frequent cannot express how close the estimate is, meanwhile frequent itemsets are large and redundant for large datasets. Regarding to the above two disadvantages, this paper proposes a mining algorithm of frequent closed itemsets based on uncertain data called UFCIM to mine frequent closed itemsets from uncertain data according to frequent itemsets mining method from uncertain data, and it is based on level mining algorithm Apriori. It uses probability of confidence to express how close the estimate is, the larger that probability of confidence is, the itemsets are more likely to be frequent. Besides as frequent closed itemsets are compact and lossless representation of frequent itemsets, so it uses compacted frequent closed itemsets to take place of frequent itemsets which are of huge size. Experimental result shows the UFCIM algorithm can mine frequent closed itemsets effectively and quickly. It can reduce redundancy and meanwhile assure the accuracy and completeness of itemsets.