模式识别与人工智能
模式識彆與人工智能
모식식별여인공지능
Pattern Recognition and Artificial Intelligence
2015年
10期
922-929
,共8页
黄再祥%周忠眉%何田中%郑艺峰
黃再祥%週忠眉%何田中%鄭藝峰
황재상%주충미%하전중%정예봉
数据挖掘%关联分类%不平衡数据%规则强度%相关度
數據挖掘%關聯分類%不平衡數據%規則彊度%相關度
수거알굴%관련분류%불평형수거%규칙강도%상관도
Data Mining%Associative Classification%Imbalance Dataset%Rule Strength%Correlation
由于多类不平衡数据中某些类别的样例数特别少,使得基于支持度-置信度的关联分类方法在这些类上产生的规则较少,甚至没有,从而导致这些类别的样例很难准确分类。针对此问题,文中提出改进的多类不平衡数据关联分类算法。为了提取更多小类的规则,根据项集与类别的正相关度提取规则。为了提高小类规则的优先级,提出利用项集类分布规则强度排序规则。此外,为解决规则冲突或无规则匹配问题,结合 KNN 分类新样例。实验表明,与基于支持度-置信度的关联分类方法相比,文中算法能提取更多的小类规则,且提高小类规则的优先级,在多类不平衡数据上取得较高的 G-mean 值和 F-score 值。
由于多類不平衡數據中某些類彆的樣例數特彆少,使得基于支持度-置信度的關聯分類方法在這些類上產生的規則較少,甚至沒有,從而導緻這些類彆的樣例很難準確分類。針對此問題,文中提齣改進的多類不平衡數據關聯分類算法。為瞭提取更多小類的規則,根據項集與類彆的正相關度提取規則。為瞭提高小類規則的優先級,提齣利用項集類分佈規則彊度排序規則。此外,為解決規則遲突或無規則匹配問題,結閤 KNN 分類新樣例。實驗錶明,與基于支持度-置信度的關聯分類方法相比,文中算法能提取更多的小類規則,且提高小類規則的優先級,在多類不平衡數據上取得較高的 G-mean 值和 F-score 值。
유우다류불평형수거중모사유별적양례수특별소,사득기우지지도-치신도적관련분류방법재저사류상산생적규칙교소,심지몰유,종이도치저사유별적양례흔난준학분류。침대차문제,문중제출개진적다류불평형수거관련분류산법。위료제취경다소류적규칙,근거항집여유별적정상관도제취규칙。위료제고소류규칙적우선급,제출이용항집류분포규칙강도배서규칙。차외,위해결규칙충돌혹무규칙필배문제,결합 KNN 분류신양례。실험표명,여기우지지도-치신도적관련분류방법상비,문중산법능제취경다적소류규칙,차제고소류규칙적우선급,재다류불평형수거상취득교고적 G-mean 치화 F-score 치。
Instances in some classes are rare in multiclass imbalanced datasets and therefore few rules for these classes are generated by support-confidence based associative classification algorithms. Consequently, instances in these minority classes are difficult to be correctly classified. Aiming at this problem, an improved associative classification algorithm for multiclass imbalanced datasets is proposed. To extract more rules for minority classes, rules are extracted according to positive correlation between itemsets and classes. Then, to improve the priority of minority classes rules, the rule strength based on itemsets class distribution is designed to rank rules. Finally, to address problems of no matched rules or matched rules in conflict, a k nearest neighbor algorithm is incorporated into the improved associative classification to classify new instances. Experimental results show that the proposed algorithm extracts more minority classes rules and promotes the priority of the minority classes rules compared with support-confidence based associative classification, and thus G-mean and F-score value for multiclass imbalance datasets are improved.