计算机应用与软件
計算機應用與軟件
계산궤응용여연건
Computer Applications and Software
2015年
8期
215-219,233
,共6页
软件缺陷预测%非均衡%抽样%随机森林%代价敏感
軟件缺陷預測%非均衡%抽樣%隨機森林%代價敏感
연건결함예측%비균형%추양%수궤삼림%대개민감
Software defect prediction%Imbalance%Sampling%Random forest%Cost-sensitive
目前软件缺陷预测的研究主要是从历史数据获取来源和预测方法这两方面入手。然而,获取到的软件历史缺陷数据往往是非均衡的,传统的预测方法会给缺陷数据带来极大的误分率。针对这一问题,提出使用基于统计抽样的非均衡分类方法来预测软件缺陷。通过经验性地对比分析12种已有抽样与分类算法组合的预测性能优劣,得到SpreadSubsampling和随机森林结合的方法( SP-RF)综合表现最好,但具有较高伪正率( FPR)。为了进一步提高预测性能,针对原始SP-RF方法会对原始数据带来较大的噪音及信息缺失等不足,提出一种基于SP-RF的内置均衡化抽样的自适应随机森林改进算法( IBSBA-RF)。实验表明,IBSBA-RF算法可以显著降低预测结果的FPR,并且进一步提高了预测结果的AUC和Balance值。
目前軟件缺陷預測的研究主要是從歷史數據穫取來源和預測方法這兩方麵入手。然而,穫取到的軟件歷史缺陷數據往往是非均衡的,傳統的預測方法會給缺陷數據帶來極大的誤分率。針對這一問題,提齣使用基于統計抽樣的非均衡分類方法來預測軟件缺陷。通過經驗性地對比分析12種已有抽樣與分類算法組閤的預測性能優劣,得到SpreadSubsampling和隨機森林結閤的方法( SP-RF)綜閤錶現最好,但具有較高偽正率( FPR)。為瞭進一步提高預測性能,針對原始SP-RF方法會對原始數據帶來較大的譟音及信息缺失等不足,提齣一種基于SP-RF的內置均衡化抽樣的自適應隨機森林改進算法( IBSBA-RF)。實驗錶明,IBSBA-RF算法可以顯著降低預測結果的FPR,併且進一步提高瞭預測結果的AUC和Balance值。
목전연건결함예측적연구주요시종역사수거획취래원화예측방법저량방면입수。연이,획취도적연건역사결함수거왕왕시비균형적,전통적예측방법회급결함수거대래겁대적오분솔。침대저일문제,제출사용기우통계추양적비균형분류방법래예측연건결함。통과경험성지대비분석12충이유추양여분류산법조합적예측성능우렬,득도SpreadSubsampling화수궤삼림결합적방법( SP-RF)종합표현최호,단구유교고위정솔( FPR)。위료진일보제고예측성능,침대원시SP-RF방법회대원시수거대래교대적조음급신식결실등불족,제출일충기우SP-RF적내치균형화추양적자괄응수궤삼림개진산법( IBSBA-RF)。실험표명,IBSBA-RF산법가이현저강저예측결과적FPR,병차진일보제고료예측결과적AUC화Balance치。
Currently the researches of software defect prediction ( SDP) are mainly conducted in two aspects of source acquisition from his-torical data and prediction methods.Unfortunately, the data of historical software defects we got are basically class imbalanced, traditional prediction methods will result in high misclassification of the defects data.To solve this problem, we propose to use an imbalanced classifica-tion method based on statistical sampling for software defect prediction.By comparing and analysing empirically the pros and cons in predic-tion performances of 12 combined algorithms consisting of ready samples and classifications, we derive that the SP-RF ( SpreadSubsampling combining with random forest) method shows the best overall performance, but a little weakness in false positive ratio ( FPR) .To further improve the prediction performance of the algorithm, as well as to address the deficiencies of primitive SP-RF method in bringing forth the bigger noise and information missing to original data, we propose an SP-RF-based adaptive random forest algorithm with inner-balanced sampling ( IBSBA-RF) .It is demonstrated by the experiment that the IBSBA-RF algorithm can noticeably reduce the FPR of predication result, and further increases the AUC and Balance measure of the prediction result as well.