计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2015年
13期
255-258,270
,共5页
C4.5算法%边界定理%Gini指标%奥卡姆剃刀%再带入估计
C4.5算法%邊界定理%Gini指標%奧卡姆剃刀%再帶入估計
C4.5산법%변계정리%Gini지표%오잡모체도%재대입고계
C4.5 algorithm%boundary theorem%Gini index%Occam’s razor%resubstitution estimate
C4.5算法作为目前最具影响力的决策树分类算法,仍存一些不足之处。针对C4.5算法在对连续值属性离散化处理过程中比较耗时的缺点,基于Fayyad和Irani的边界定理,在连续属性离散化之后使用Gini指标代替信息熵对算法进行了化简。针对决策树算法中的过度拟合问题,基于Occam’s razor,采用再带入估计,对算法进行了改进。将上述思想应用于金融借贷数据,实验结果表明,改进的C4.5算法在保证准确率的前提下,执行时间平均降低8.74%,模型复杂度平均降低6.26%,表明了该算法的有效性。
C4.5算法作為目前最具影響力的決策樹分類算法,仍存一些不足之處。針對C4.5算法在對連續值屬性離散化處理過程中比較耗時的缺點,基于Fayyad和Irani的邊界定理,在連續屬性離散化之後使用Gini指標代替信息熵對算法進行瞭化簡。針對決策樹算法中的過度擬閤問題,基于Occam’s razor,採用再帶入估計,對算法進行瞭改進。將上述思想應用于金融藉貸數據,實驗結果錶明,改進的C4.5算法在保證準確率的前提下,執行時間平均降低8.74%,模型複雜度平均降低6.26%,錶明瞭該算法的有效性。
C4.5산법작위목전최구영향력적결책수분류산법,잉존일사불족지처。침대C4.5산법재대련속치속성리산화처리과정중비교모시적결점,기우Fayyad화Irani적변계정리,재련속속성리산화지후사용Gini지표대체신식적대산법진행료화간。침대결책수산법중적과도의합문제,기우Occam’s razor,채용재대입고계,대산법진행료개진。장상술사상응용우금융차대수거,실험결과표명,개진적C4.5산법재보증준학솔적전제하,집행시간평균강저8.74%,모형복잡도평균강저6.26%,표명료해산법적유효성。
C4.5 is the most influential decision tree classified algorithm, but it still has some deficiencies. To improve the deficiency of consuming more time in discretizing continuous-valued attributes using C4.5 algorithm, a new simplified algorithm is proposed by using Gini index to replace information entropy after discretizing continuous-valued attributes based on Fayyad and Irani boundary theory. To solving the over fitting problem in decision tree method, the improved algorithm is considered by using resubstitution estimate based on Occam’s razor. Applying the idea above to financial loan data, experimental results show that the execution time is reduced by an average of 8.74%, and that the model com-plexity is reduced by an average of 6.26% by using the improved C4.5 algorithm under the premise of guaranteeing the accuracy. Finally, the experimental results verify the validity of this algorithm.