燕山大学学报
燕山大學學報
연산대학학보
JOURNAL OF YANSHAN UNIVERSITY
2014年
6期
503-515
,共13页
邹目权%王丽珍%姚华传%芦俊丽
鄒目權%王麗珍%姚華傳%蘆俊麗
추목권%왕려진%요화전%호준려
广义关联分析%蕴涵-约束框架%约束的映射%合理阈值%multi-knowledge tree
廣義關聯分析%蘊涵-約束框架%約束的映射%閤理閾值%multi-knowledge tree
엄의관련분석%온함-약속광가%약속적영사%합리역치%multi-knowledge tree
generalized correlation analysis%implication-constraint framework%constraint mapping%reasonable threshold%multi-knowledge tree
本文将所有基于事务和非基于事务的关联分析称为广义的关联分析。基于事务的关联分析主要依托支持度-置信度框架进行数据挖掘,而非基于事务的关联分析常采用参与度-条件概率框架。首先,在讨论强关联规则的正确性、可靠性和有趣性的基础上,提出了蕴涵-约束框架。其次,提出并论证了最小支持度_和最小置信度_的合理取值范围,从而将最小置信度由支持度-置信度框架下的(0,1]缩小为蕴涵-约束框架下的(0.5,1],最小支持度由(0,1]缩小为(0,_]。第三,提出随机顶点极大团划分法,它能将非基于事务的关联分析转化为基于事务的关联分析,从而使广义关联分析问题整合成为基于事务的关联分析问题。第四,基于映射的概念,将约束划分为事前、事中和事后约束,从而形式化地解决了约束的应用及方法。第五,利用稠密维和稀疏维,提出了一种multi-knowledge tree的树形存储结构,能在无论频繁项集是否满足向下闭合性质时都有效降低算法的空间复杂度;同时,在数据增加、删除、修改后能快速获取新的强关联规则。最后,大量实验验证了所提出理论和算法的效果和效率。
本文將所有基于事務和非基于事務的關聯分析稱為廣義的關聯分析。基于事務的關聯分析主要依託支持度-置信度框架進行數據挖掘,而非基于事務的關聯分析常採用參與度-條件概率框架。首先,在討論彊關聯規則的正確性、可靠性和有趣性的基礎上,提齣瞭蘊涵-約束框架。其次,提齣併論證瞭最小支持度_和最小置信度_的閤理取值範圍,從而將最小置信度由支持度-置信度框架下的(0,1]縮小為蘊涵-約束框架下的(0.5,1],最小支持度由(0,1]縮小為(0,_]。第三,提齣隨機頂點極大糰劃分法,它能將非基于事務的關聯分析轉化為基于事務的關聯分析,從而使廣義關聯分析問題整閤成為基于事務的關聯分析問題。第四,基于映射的概唸,將約束劃分為事前、事中和事後約束,從而形式化地解決瞭約束的應用及方法。第五,利用稠密維和稀疏維,提齣瞭一種multi-knowledge tree的樹形存儲結構,能在無論頻繁項集是否滿足嚮下閉閤性質時都有效降低算法的空間複雜度;同時,在數據增加、刪除、脩改後能快速穫取新的彊關聯規則。最後,大量實驗驗證瞭所提齣理論和算法的效果和效率。
본문장소유기우사무화비기우사무적관련분석칭위엄의적관련분석。기우사무적관련분석주요의탁지지도-치신도광가진행수거알굴,이비기우사무적관련분석상채용삼여도-조건개솔광가。수선,재토론강관련규칙적정학성、가고성화유취성적기출상,제출료온함-약속광가。기차,제출병론증료최소지지도_화최소치신도_적합리취치범위,종이장최소치신도유지지도-치신도광가하적(0,1]축소위온함-약속광가하적(0.5,1],최소지지도유(0,1]축소위(0,_]。제삼,제출수궤정점겁대단화분법,타능장비기우사무적관련분석전화위기우사무적관련분석,종이사엄의관련분석문제정합성위기우사무적관련분석문제。제사,기우영사적개념,장약속화분위사전、사중화사후약속,종이형식화지해결료약속적응용급방법。제오,이용주밀유화희소유,제출료일충multi-knowledge tree적수형존저결구,능재무론빈번항집시부만족향하폐합성질시도유효강저산법적공간복잡도;동시,재수거증가、산제、수개후능쾌속획취신적강관련규칙。최후,대량실험험증료소제출이론화산법적효과화효솔。
In this paper, the association analysis based on transactions or non-transactions is named generalized association analysis. Association analysis based on transactions relies on support-confidence framework, while the participation-conditional probability framework is used in non-transactions' association analysis, for example, the spatial co-location pattern mining. Firstly, based on the discussion about correctness, reliability, and interest of the strong association rules, an implication-constraint framework is proposed. Secondly, the reasonable threshold ranges of minimum confidence denoted _and minimum support denoted_ are proved. _is reduced from (0,1] in the support-confidence framework to (0.5,1] in the implication-constraint framework, and _ from (0,1] to (0,_ ]. Thirdly, a random vertex maximum clique partition method is presented, which can transform the association analysis based on transactional data into the association analysis of non-transactions. So that the problem of generalized association analysis can be integrated to the problem of association analysis based on transactions. Fo-urthly, based on mapping' idea, the constraints are divided into before, during, and after the constraints. Thus the application of constraints is resolved formally. Fifthly, a tree storage structure named multi-knowledge tree is proposed as it can effectively reduce the storage space of data. No algorithm can mine the results quickly with the updated data such as data increasing, deleting, and altering at present. However, the new algorithm named multi-pruning algorithm proposed in this paper can obtain the results timely after data updating. Finally, the effectiveness and efficiency of the algorithms proposed are verified by extensive experiments.