计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2015年
6期
1504-1509
,共6页
周浩%刘萍%邱桃荣%白小明
週浩%劉萍%邱桃榮%白小明
주호%류평%구도영%백소명
粒计算%并行%映射规约%决策树%信息增益
粒計算%併行%映射規約%決策樹%信息增益
립계산%병행%영사규약%결책수%신식증익
granular computing%parallel%MapReduce%decision tree%information gain
针对传统的决策树分类算法不能有效解决海量数据挖掘的问题,结合并行处理模型M apReduce ,研究基于粒计算的ID3决策树分类的并行化处理方法。基于信息粒的二进制表示来构建属性的二进制信息粒向量,给出数据集的二进制信息粒关联矩阵表示;基于二进制信息粒关联矩阵,提出属性的信息增益的计算方法,设计基于M apReduce的粒计算决策树并行分类算法。通过使用标准数据集和实际气象领域的雷电真实数据集进行测试,验证了该算法的有效性。
針對傳統的決策樹分類算法不能有效解決海量數據挖掘的問題,結閤併行處理模型M apReduce ,研究基于粒計算的ID3決策樹分類的併行化處理方法。基于信息粒的二進製錶示來構建屬性的二進製信息粒嚮量,給齣數據集的二進製信息粒關聯矩陣錶示;基于二進製信息粒關聯矩陣,提齣屬性的信息增益的計算方法,設計基于M apReduce的粒計算決策樹併行分類算法。通過使用標準數據集和實際氣象領域的雷電真實數據集進行測試,驗證瞭該算法的有效性。
침대전통적결책수분류산법불능유효해결해량수거알굴적문제,결합병행처리모형M apReduce ,연구기우립계산적ID3결책수분류적병행화처리방법。기우신식립적이진제표시래구건속성적이진제신식립향량,급출수거집적이진제신식립관련구진표시;기우이진제신식립관련구진,제출속성적신식증익적계산방법,설계기우M apReduce적립계산결책수병행분류산법。통과사용표준수거집화실제기상영역적뇌전진실수거집진행측시,험증료해산법적유효성。
Because the traditional decision tree algorithm fails to solve the mass data mining ,combining with MapReduce ,the parallel ID3 algorithm based on the granular computing (GrC) was studied .Based on binary representation of information granu‐lar ,a binary vector of attribute was constructed ,a binary information granule correlation matrix of dataset was also given .On the basis of this ,a algorithm was proposed to compute information gain of attributes ,and a decision tree method using granular computing was also proposed ,which was a parallel classical algorithm based on MapReduce .UCI benchmark datasets and the real thunder data from meteorological bureau were used in the experiments to verify the effectiveness the presented algorithm .