山东大学学报(工学版)
山東大學學報(工學版)
산동대학학보(공학판)
Journal of Shandong University (Engineering Science)
2015年
5期
36-42
,共7页
分级结构%决策树%代价敏感%不平衡数据%换机预测
分級結構%決策樹%代價敏感%不平衡數據%換機預測
분급결구%결책수%대개민감%불평형수거%환궤예측
hierarchical structure%decision tree%cost sensitive%imbalance data%prediction of replacing phone
在手机用户数据集中,非换机用户和换机用户存在着严重的不平衡,传统的数据挖掘方法在处理不平衡数据时追求整体正确率,导致换机用户的预测精度较低。针对这一问题,提出一种基于分级式代价敏感决策树的换机预测方法。首先利用粗糙集对原始数据集进行属性约简并计算各属性的重要度,然后根据属性重要度对属性分块建立分级结构,最后以基尼系数和误分代价为分裂标准构建代价敏感决策树,作为每一级的基分类器。对某电信运营商客户数据进行3个仿真试验,结果表明:分级式代价敏感决策树在原始的不平衡用户数据集及欠抽样处理后的平衡用户数据集上都有较好的结果。
在手機用戶數據集中,非換機用戶和換機用戶存在著嚴重的不平衡,傳統的數據挖掘方法在處理不平衡數據時追求整體正確率,導緻換機用戶的預測精度較低。針對這一問題,提齣一種基于分級式代價敏感決策樹的換機預測方法。首先利用粗糙集對原始數據集進行屬性約簡併計算各屬性的重要度,然後根據屬性重要度對屬性分塊建立分級結構,最後以基尼繫數和誤分代價為分裂標準構建代價敏感決策樹,作為每一級的基分類器。對某電信運營商客戶數據進行3箇倣真試驗,結果錶明:分級式代價敏感決策樹在原始的不平衡用戶數據集及欠抽樣處理後的平衡用戶數據集上都有較好的結果。
재수궤용호수거집중,비환궤용호화환궤용호존재착엄중적불평형,전통적수거알굴방법재처리불평형수거시추구정체정학솔,도치환궤용호적예측정도교저。침대저일문제,제출일충기우분급식대개민감결책수적환궤예측방법。수선이용조조집대원시수거집진행속성약간병계산각속성적중요도,연후근거속성중요도대속성분괴건립분급결구,최후이기니계수화오분대개위분렬표준구건대개민감결책수,작위매일급적기분류기。대모전신운영상객호수거진행3개방진시험,결과표명:분급식대개민감결책수재원시적불평형용호수거집급흠추양처리후적평형용호수거집상도유교호적결과。
In the data of mobile phone users,imbalance problem existed between the replacement users and non replace-ment users,however traditional date mining pursued the best overall accuracy which led the prediction accuracy of the replacement users overly low.In order to solve this problem,a method of predicting the users who replace phone was proposed based on hierarchical cost sensitive decision tree.The algorithm realized attributes reduction and calculated the importance of attributes by rough set,then a hierarchical structure was built by parting the attributes;finally a cost sen-sitive decision tree was regarded as the base classifier for the hierarchical structure,the decision tree was constructed with its splitting criterion which included gini index and misclassification cost.Three experiments were made for the us-ers data which from a telecom operator,the results showed that the hierarchical cost sensitive decision tree achieved a better effect on the imbalance user data and balance user data which obtained by under sampling.