计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2015年
5期
1209-1213
,共5页
决策树%CART算法%分割阈值%Fayyad边界点判定定理%关键度度量
決策樹%CART算法%分割閾值%Fayyad邊界點判定定理%關鍵度度量
결책수%CART산법%분할역치%Fayyad변계점판정정리%관건도도량
decision tree%CART algorithm%segmentation threshold%Fayyad boundary point determination principle%key deci-sion factor
利用Fayyad边界点判定原理对CART决策树选取连续属性的分割阈值的方法进行改进,由Fayyad边界点判定原理可知,建树过程中选取连续属性的分割阈值时,不需要检查每一个分割点,只要检查样本排序后,该属性相邻不同类别的分界点即可;针对样本集主类类属分布不平衡时,样本量占相对少数的小类属样本不能很好地对分类进行表决的情况,采用关键度度量的方法进行改进。基于这两点改进构建CART分类器。实验结果表明,Fayyad边界点判定原理适用于CART算法,利用改进后的CART算法生成决策树的效率提高了近45%,在样本集主类类属分布不平衡的情况下,分类准确率也略有提高。
利用Fayyad邊界點判定原理對CART決策樹選取連續屬性的分割閾值的方法進行改進,由Fayyad邊界點判定原理可知,建樹過程中選取連續屬性的分割閾值時,不需要檢查每一箇分割點,隻要檢查樣本排序後,該屬性相鄰不同類彆的分界點即可;針對樣本集主類類屬分佈不平衡時,樣本量佔相對少數的小類屬樣本不能很好地對分類進行錶決的情況,採用關鍵度度量的方法進行改進。基于這兩點改進構建CART分類器。實驗結果錶明,Fayyad邊界點判定原理適用于CART算法,利用改進後的CART算法生成決策樹的效率提高瞭近45%,在樣本集主類類屬分佈不平衡的情況下,分類準確率也略有提高。
이용Fayyad변계점판정원리대CART결책수선취련속속성적분할역치적방법진행개진,유Fayyad변계점판정원리가지,건수과정중선취련속속성적분할역치시,불수요검사매일개분할점,지요검사양본배서후,해속성상린불동유별적분계점즉가;침대양본집주류류속분포불평형시,양본량점상대소수적소류속양본불능흔호지대분류진행표결적정황,채용관건도도량적방법진행개진。기우저량점개진구건CART분류기。실험결과표명,Fayyad변계점판정원리괄용우CART산법,이용개진후적CART산법생성결책수적효솔제고료근45%,재양본집주류류속분포불평형적정황하,분류준학솔야략유제고。
Fayyad boundary point determination principle was used to improve the method of choosing continuous-valued attri-butes’segmentation threshold in CART decision tree.Through Fayyad boundary point determination principle,in the process of selecting continuous-valued attributes’segmentation threshold,adjacent boundary points which were sorted and in different clas-ses were checked,instead of getting every split point checked.And the key decision factor was used to improve the classification accuracy when the main classes of sample set distributed imbalanced.CART classifier was constructed based on these methods. The experimental result shows that Fayyad boundary point determination principle is appropriate for CART algorithm,the effi-ciency of building decision tree is improved by about 45 percent,and when the main classes of sample set distribute imbalanced, the classification accuracy of the improved algorithm is higher than that of the original one.