计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
7期
790-801
,共12页
孙鹤立%陈强%刘玮%黄健斌%邹建华
孫鶴立%陳彊%劉瑋%黃健斌%鄒建華
손학립%진강%류위%황건빈%추건화
频繁子图挖掘%MapReduce%Hadoop平台
頻繁子圖挖掘%MapReduce%Hadoop平檯
빈번자도알굴%MapReduce%Hadoop평태
frequent subgraph mining%MapReduce%Hadoop platform
频繁子图挖掘是数据挖掘领域的一个重要问题,并且有着广泛的应用。在Hadoop平台上实现了一种基于MapReduce的高效频繁子图挖掘算法Cloud-GFSG(cloud-global frequent subgraph)。该算法基于Apriori思想,在扩展边生成新的子图时,使用已经挖掘出的k-1阶的频繁子图生成k阶的频繁子图。同时,检查是否存在待扩展生成的子图,设定生成的频繁子图表示规则,保证了频繁子图信息的唯一性。较同类算法相比,该算法在挖掘频繁子图时更具通用性,并且在扩展边时避免产生大量的复制图,从而使得算法的正确性得以保证,且运行效率显著提高。
頻繁子圖挖掘是數據挖掘領域的一箇重要問題,併且有著廣汎的應用。在Hadoop平檯上實現瞭一種基于MapReduce的高效頻繁子圖挖掘算法Cloud-GFSG(cloud-global frequent subgraph)。該算法基于Apriori思想,在擴展邊生成新的子圖時,使用已經挖掘齣的k-1階的頻繁子圖生成k階的頻繁子圖。同時,檢查是否存在待擴展生成的子圖,設定生成的頻繁子圖錶示規則,保證瞭頻繁子圖信息的唯一性。較同類算法相比,該算法在挖掘頻繁子圖時更具通用性,併且在擴展邊時避免產生大量的複製圖,從而使得算法的正確性得以保證,且運行效率顯著提高。
빈번자도알굴시수거알굴영역적일개중요문제,병차유착엄범적응용。재Hadoop평태상실현료일충기우MapReduce적고효빈번자도알굴산법Cloud-GFSG(cloud-global frequent subgraph)。해산법기우Apriori사상,재확전변생성신적자도시,사용이경알굴출적k-1계적빈번자도생성k계적빈번자도。동시,검사시부존재대확전생성적자도,설정생성적빈번자도표시규칙,보증료빈번자도신식적유일성。교동류산법상비,해산법재알굴빈번자도시경구통용성,병차재확전변시피면산생대량적복제도,종이사득산법적정학성득이보증,차운행효솔현저제고。
Frequent subgraph mining is an important problem in data mining domain and has been used widely. This paper proposes an efficient algorithm Cloud-GFSG (cloud-global frequent subgraph), by using MapReduce on Hadoop platform for mining frequent subgraphs. The algorithm is based on the principle of Apriori. It uses the discovered frequent subgraphs whose support is k-1 to generate the candidate frequent subgraphs whose support is k when it gener-ates new subgraphs by extending edge. Meanwhile, it checks whether there exists any subgraph which would be gener-ated and sets the frequent subgraph generation rules to ensure the uniqueness of the frequent subgraphs. Compared with the state-of-the-art algorithms, the proposed algorithm has more general function and can avoid generating replicate graphs while extending a new edge. Therefore, its correctness can be ensured and the efficiency had been improved significantly.