计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2015年
9期
135-141
,共7页
郑海雁%王远方%熊政%李昆明%崇志宏%尹飞
鄭海雁%王遠方%熊政%李昆明%崇誌宏%尹飛
정해안%왕원방%웅정%리곤명%숭지굉%윤비
近似频繁模式%标签集约束%并行化
近似頻繁模式%標籤集約束%併行化
근사빈번모식%표첨집약속%병행화
proximity%label-constraint%parallel
近似频繁模式衍生于频繁模式,综合了频繁项集与频繁子图的特点。针对该模式的研究集中在无标签图上,其应用场景主要为社交网络、语义网络、智能电网等。近似频繁模式挖掘过程同时涉及频繁项集挖掘和频繁子图挖掘,因此已有的处理频繁模式挖掘算法无法较好地解决近似频繁模式挖掘问题。基于近似频繁模式结构,将其拓展到带标签图中,引入标签集约束,并设计标签集约束近似频繁模式挖掘算法LCPP(Label-Constraint Proximity Pattern),该算法并行部署在MapReduce计算模型中,弥补了开源pFP算法处理大规模数据时效率不高的缺点。实验结果验证了该算法的有效性和可扩展性,表明了LCPP算法是pFP算法的极佳补充。
近似頻繁模式衍生于頻繁模式,綜閤瞭頻繁項集與頻繁子圖的特點。針對該模式的研究集中在無標籤圖上,其應用場景主要為社交網絡、語義網絡、智能電網等。近似頻繁模式挖掘過程同時涉及頻繁項集挖掘和頻繁子圖挖掘,因此已有的處理頻繁模式挖掘算法無法較好地解決近似頻繁模式挖掘問題。基于近似頻繁模式結構,將其拓展到帶標籤圖中,引入標籤集約束,併設計標籤集約束近似頻繁模式挖掘算法LCPP(Label-Constraint Proximity Pattern),該算法併行部署在MapReduce計算模型中,瀰補瞭開源pFP算法處理大規模數據時效率不高的缺點。實驗結果驗證瞭該算法的有效性和可擴展性,錶明瞭LCPP算法是pFP算法的極佳補充。
근사빈번모식연생우빈번모식,종합료빈번항집여빈번자도적특점。침대해모식적연구집중재무표첨도상,기응용장경주요위사교망락、어의망락、지능전망등。근사빈번모식알굴과정동시섭급빈번항집알굴화빈번자도알굴,인차이유적처리빈번모식알굴산법무법교호지해결근사빈번모식알굴문제。기우근사빈번모식결구,장기탁전도대표첨도중,인입표첨집약속,병설계표첨집약속근사빈번모식알굴산법LCPP(Label-Constraint Proximity Pattern),해산법병행부서재MapReduce계산모형중,미보료개원pFP산법처리대규모수거시효솔불고적결점。실험결과험증료해산법적유효성화가확전성,표명료LCPP산법시pFP산법적겁가보충。
Proximity pattern is derived from frequent pattern, characterized by a combination of frequent items and fre-quent subgraphs. Research about proximity pattern is mainly concentrated on the unlabeled graph, and the main application scenarios are social network, semantic Web and smart grid, etc. Proximity pattern mining process involves both frequent items mining and frequent subgraph mining, therefore the existing methods of pattern mining can not be used directly on the issue. On the basis of the proximity pattern, this paper introduces the LCPP(Label-Constraint Proximity Pattern)algo-rithm during the label graph. The algorithm is deployed in the MapReduce parallel computing model, making up for the inefficiency of pFP algorithm when processing the large-scale database. The experimental results show that the parallel algo-rithm can not only improve the computing speed, but also has good scalability, and the LCPP algorithm is an excellent complement of pFP.