计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2014年
3期
55-58,75
,共5页
数据挖掘%数据流%滑动窗口%矩阵%Top-k频繁项集
數據挖掘%數據流%滑動窗口%矩陣%Top-k頻繁項集
수거알굴%수거류%활동창구%구진%Top-k빈번항집
data mining%data stream%sliding window%matrix%Top-k frequent itemset
传统的数据挖掘算法在挖掘频繁项集时会产生大量的冗余项集,影响挖掘效率。为此,提出一种基于矩阵的数据流Top-k频繁项集挖掘算法。引入2个0-1矩阵,即事务矩阵和二项集矩阵。采用事务矩阵表示滑动窗口模型中的事务列表,通过计算每行的支持度得到二项集矩阵。利用二项集矩阵得到候选项集,将事务矩阵中对应的行做逻辑与运算,计算出候选项集的支持度,从而得到Top-k频繁项集。把挖掘的结果存入数据字典中,当用户查询时,能够按支持度降序输出Top-k频繁项集。实验结果表明,该算法在挖掘过程中能避免冗余项集的产生,在保证正确率的前提下具有较高的时间效率。
傳統的數據挖掘算法在挖掘頻繁項集時會產生大量的冗餘項集,影響挖掘效率。為此,提齣一種基于矩陣的數據流Top-k頻繁項集挖掘算法。引入2箇0-1矩陣,即事務矩陣和二項集矩陣。採用事務矩陣錶示滑動窗口模型中的事務列錶,通過計算每行的支持度得到二項集矩陣。利用二項集矩陣得到候選項集,將事務矩陣中對應的行做邏輯與運算,計算齣候選項集的支持度,從而得到Top-k頻繁項集。把挖掘的結果存入數據字典中,噹用戶查詢時,能夠按支持度降序輸齣Top-k頻繁項集。實驗結果錶明,該算法在挖掘過程中能避免冗餘項集的產生,在保證正確率的前提下具有較高的時間效率。
전통적수거알굴산법재알굴빈번항집시회산생대량적용여항집,영향알굴효솔。위차,제출일충기우구진적수거류Top-k빈번항집알굴산법。인입2개0-1구진,즉사무구진화이항집구진。채용사무구진표시활동창구모형중적사무렬표,통과계산매행적지지도득도이항집구진。이용이항집구진득도후선항집,장사무구진중대응적행주라집여운산,계산출후선항집적지지도,종이득도Top-k빈번항집。파알굴적결과존입수거자전중,당용호사순시,능구안지지도강서수출Top-k빈번항집。실험결과표명,해산법재알굴과정중능피면용여항집적산생,재보증정학솔적전제하구유교고적시간효솔。
The past algorithms produce large amounts of redundant itemsets, and they affect the efficiency of data mining. Therefore, a Top-k frequent itemsets mining algorithm over data streams based on matrix is proposed. Two 0-1 matrices, transaction matrix and 2-itemsets matrix, are introduced into the algorithm. Using transaction matrix to express the transaction list of a sliding window, and 2-itemsets matrix is obtained by calculating the support of each row. Then it can get candidate items by 2-itemsets matrix, and Top-k frequent itemsets are obtained by calculating the support of candidate items through logic and operation of correspond row in transaction matrix. Finally it saves the result of data mining into data dictionary. The algorithm can output the Top-k frequent itemsets by support in descendant order when user queries. Experimental results show that the algorithm avoids redundant itemsets in the process of data mining, and the efficiency of data mining is improved appreciably under the premise of accuracy.