计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
12期
1442-1451
,共10页
李勇%黄志球%房丙午%王勇
李勇%黃誌毬%房丙午%王勇
리용%황지구%방병오%왕용
软件缺陷预测%代价敏感分类%最优代价因子%决策树%集成算法
軟件缺陷預測%代價敏感分類%最優代價因子%決策樹%集成算法
연건결함예측%대개민감분류%최우대개인자%결책수%집성산법
software defects prediction%cost-sensitive classification%optimal cost-factor%decision tree%ensemble algorithm
软件缺陷预测是提高软件测试效率,保证软件可靠性的重要途径。考虑到软件缺陷预测模型对软件模块错误分类代价的不同,提出了代价敏感分类的软件缺陷预测模型构建方法。针对代码属性度量数据,采用Bagging方式有放回地多次随机抽取训练样本来构建代价敏感分类的决策树基分类器,然后通过投票的方式集成后进行软件模块的缺陷预测,并给出模型构建过程中代价因子最优值的判定选择方法。使用公开的NASA软件缺陷预测数据集进行仿真实验,结果表明该方法在保证缺陷预测率的前提下,误报率明显降低,综合评价指标AUC和F值均优于现有方法。
軟件缺陷預測是提高軟件測試效率,保證軟件可靠性的重要途徑。攷慮到軟件缺陷預測模型對軟件模塊錯誤分類代價的不同,提齣瞭代價敏感分類的軟件缺陷預測模型構建方法。針對代碼屬性度量數據,採用Bagging方式有放迴地多次隨機抽取訓練樣本來構建代價敏感分類的決策樹基分類器,然後通過投票的方式集成後進行軟件模塊的缺陷預測,併給齣模型構建過程中代價因子最優值的判定選擇方法。使用公開的NASA軟件缺陷預測數據集進行倣真實驗,結果錶明該方法在保證缺陷預測率的前提下,誤報率明顯降低,綜閤評價指標AUC和F值均優于現有方法。
연건결함예측시제고연건측시효솔,보증연건가고성적중요도경。고필도연건결함예측모형대연건모괴착오분류대개적불동,제출료대개민감분류적연건결함예측모형구건방법。침대대마속성도량수거,채용Bagging방식유방회지다차수궤추취훈련양본래구건대개민감분류적결책수기분류기,연후통과투표적방식집성후진행연건모괴적결함예측,병급출모형구건과정중대개인자최우치적판정선택방법。사용공개적NASA연건결함예측수거집진행방진실험,결과표명해방법재보증결함예측솔적전제하,오보솔명현강저,종합평개지표AUC화F치균우우현유방법。
Software defects prediction is considered as an effective means for the optimization of quality assurance activities. Taking into account the different misclassification cost for unknown software modules using the software defects prediction models, this paper proposes the cost-sensitive classification method for constructing software defects prediction models. Firstly, for the code attribute metric data, decision tree algorithm is selected to construct base-classifiers using cost-sensitive classification method by sampling with replacement of Bagging. Then, the defects prediction model is constructed based on majority rule, and the approach to obtain the approximate optimal cost-factor value is researched. The experimental results on the NASA software defects prediction datasets show that the proposed method is averagely superior to the conventional methods with lower probability of false alarm and higher compre-hensive evaluation values.