科技通报
科技通報
과기통보
BULLETIN OF SCIENCE AND TECHNOLOGY
2015年
6期
211-213
,共3页
贝叶斯粗糙集%频繁项挖掘%大数据
貝葉斯粗糙集%頻繁項挖掘%大數據
패협사조조집%빈번항알굴%대수거
Bayesian Rough Set%frequent item mining%large data
对大数据的频繁项集挖掘是关联规则挖掘的关键步骤,通过有效的频繁项挖掘提高大数据量数据库的访问效率。传统方法中对大数据的频繁项集挖掘采用FP-Growth的粗糙集挖掘算法,扩展性和容错性不好。提出一种基于贝叶斯粗糙集的大数据频繁项挖掘技术,引入后缀项表的概念,通过后缀项表的构建,保留频繁项集的完整信息。构建FP-Tree,生成闭频繁项集,计算样本的密度,并抽取高密度区域的点集作为聚类中心集合,进行后缀项表的构造,按支持度分成若干集合,对各约简集内的属性集合进行融合,用变精度粗糙集的贝叶斯粗糙进行数据挖掘算法改进,仿真结果表明,算法不受可变参数的影响,鲁棒性较高,数据挖掘的准确度较高,运行时间较短。算法将在人工智能和数据挖掘领域具有更广的应用前景。
對大數據的頻繁項集挖掘是關聯規則挖掘的關鍵步驟,通過有效的頻繁項挖掘提高大數據量數據庫的訪問效率。傳統方法中對大數據的頻繁項集挖掘採用FP-Growth的粗糙集挖掘算法,擴展性和容錯性不好。提齣一種基于貝葉斯粗糙集的大數據頻繁項挖掘技術,引入後綴項錶的概唸,通過後綴項錶的構建,保留頻繁項集的完整信息。構建FP-Tree,生成閉頻繁項集,計算樣本的密度,併抽取高密度區域的點集作為聚類中心集閤,進行後綴項錶的構造,按支持度分成若榦集閤,對各約簡集內的屬性集閤進行融閤,用變精度粗糙集的貝葉斯粗糙進行數據挖掘算法改進,倣真結果錶明,算法不受可變參數的影響,魯棒性較高,數據挖掘的準確度較高,運行時間較短。算法將在人工智能和數據挖掘領域具有更廣的應用前景。
대대수거적빈번항집알굴시관련규칙알굴적관건보취,통과유효적빈번항알굴제고대수거량수거고적방문효솔。전통방법중대대수거적빈번항집알굴채용FP-Growth적조조집알굴산법,확전성화용착성불호。제출일충기우패협사조조집적대수거빈번항알굴기술,인입후철항표적개념,통과후철항표적구건,보류빈번항집적완정신식。구건FP-Tree,생성폐빈번항집,계산양본적밀도,병추취고밀도구역적점집작위취류중심집합,진행후철항표적구조,안지지도분성약간집합,대각약간집내적속성집합진행융합,용변정도조조집적패협사조조진행수거알굴산법개진,방진결과표명,산법불수가변삼수적영향,로봉성교고,수거알굴적준학도교고,운행시간교단。산법장재인공지능화수거알굴영역구유경엄적응용전경。
The frequent itemsets on data mining is a key step of association rule mining, through frequent item mined effec?tively, it can improve the access efficiency of large quantities of data database. The rough set algorithm for mining frequent item sets in the traditional method on data mining using FP-Growth, scalability and fault tolerance is not good. Put forward a kind of data mining technology based on large Bayesian Rough set of frequent items, introducing the concept of suffix ta?ble, by constructing a suffix table, complete information remain frequent item sets. Construction of FP-Tree, the generation of closed frequent itemsets, calculate the sample density, and extract the regions of high density point set as the clustering center, constructed suffix table, according to the degree of support is divided into a plurality of sets, attribute of each reduc?tion set within the set of fusion, using Bayesian variable precision rough sets rough data improved data mining algorithm, simulation results show that the algorithm is not affected by the impact of variable parameters, high robustness, data mining is of high accuracy and short running time. The algorithm will have more wide prospect of application in the field of artifi?cial intelligence and data mining.