山东大学学报(理学版)
山東大學學報(理學版)
산동대학학보(이학판)
JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE)
2014年
1期
71-75
,共5页
频繁模式树%条件模式树%关联分类%显著度%分布式信息挖掘
頻繁模式樹%條件模式樹%關聯分類%顯著度%分佈式信息挖掘
빈번모식수%조건모식수%관련분류%현저도%분포식신식알굴
FP-tree%conditional pattern tree%associative classification%significant degree%distributed information mining
传统的信息挖掘技术已经无法满足大数据环境下日益复杂的应用需求,而分布式数据挖掘技术是解决这个难题的一种手段,因此提出了基于改进型频繁模式树( FP-Tree)的分布式关联分类算法。首先,在各局部节点优化FP-Tree,生成局部条件模式树( CFP-Tree),再通过各节点间传送CFP-Tree构建全局CFP-Tree;其次,在挖掘全局CFP-Tree时通过计算显著度来获取初始的全局显著分类规则;最后,利用剪枝策略选取一个较小规则集来构造全局的关联分类器。实验结果表明该算法能够有效降低网络通信量,提高信息挖掘效率,同时保证剪枝的质量和规则的统计显著性,提高分类的精确性。
傳統的信息挖掘技術已經無法滿足大數據環境下日益複雜的應用需求,而分佈式數據挖掘技術是解決這箇難題的一種手段,因此提齣瞭基于改進型頻繁模式樹( FP-Tree)的分佈式關聯分類算法。首先,在各跼部節點優化FP-Tree,生成跼部條件模式樹( CFP-Tree),再通過各節點間傳送CFP-Tree構建全跼CFP-Tree;其次,在挖掘全跼CFP-Tree時通過計算顯著度來穫取初始的全跼顯著分類規則;最後,利用剪枝策略選取一箇較小規則集來構造全跼的關聯分類器。實驗結果錶明該算法能夠有效降低網絡通信量,提高信息挖掘效率,同時保證剪枝的質量和規則的統計顯著性,提高分類的精確性。
전통적신식알굴기술이경무법만족대수거배경하일익복잡적응용수구,이분포식수거알굴기술시해결저개난제적일충수단,인차제출료기우개진형빈번모식수( FP-Tree)적분포식관련분류산법。수선,재각국부절점우화FP-Tree,생성국부조건모식수( CFP-Tree),재통과각절점간전송CFP-Tree구건전국CFP-Tree;기차,재알굴전국CFP-Tree시통과계산현저도래획취초시적전국현저분류규칙;최후,이용전지책략선취일개교소규칙집래구조전국적관련분류기。실험결과표명해산법능구유효강저망락통신량,제고신식알굴효솔,동시보증전지적질량화규칙적통계현저성,제고분류적정학성。
Traditional information mining technology has been unable to meet the increasingly complex application requirements in the big data environment.The distributed data mining technique is a means to solve this problem.An improved distributed associative classification algorithm based on improved FP-tree was presented.First, FP-Tree was optimized in each local node to generate local conditional pattern tree ( CFP-Tree), and then a global CFP-Tree was constructed through the inter-site transmission of each CFP-Tree.Second, the initial global significant classification rules were obtained by calculating significant degree in the process of global CFP-Tree mining.Final, the pruning strate-gies were used to get a small set of rules to construct the overall associative classifier.Experimental results show that this algorithm can not only effectively reduce network traffic and improve mining efficiency, but also ensure ensuring statistical significance of rules and improve the ability for the discovery of implicit rules.