计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2010年
3期
115-117
,共3页
分类%KNN算法%属性值%信息熵
分類%KNN算法%屬性值%信息熵
분류%KNN산법%속성치%신식적
classification%K-nearest neighbor algorithm%attribute value%information entropy
为了克服传统KNN算法,距离加权-KNN算法在距离定义及投票方式上的不足,提出了一种基于属性值对类别重要性的改进算法Entropy-KNN.首先定义两个样本间的距离为相同属性值的平均信息熵,此距离可通过重要属性值有效度量样本之间的相似程度,其次算法Entropy-KNN根据上述定义的距离选取与待测试样本距离最小的K个近邻,最后根据各类近邻样本点的平均距离及个数判断待测试样本的类别.在蘑菇数据集上的实验表明,Entropy-KNN算法的分类准确率高于传统KNN算法和距离加权KNN算法.
為瞭剋服傳統KNN算法,距離加權-KNN算法在距離定義及投票方式上的不足,提齣瞭一種基于屬性值對類彆重要性的改進算法Entropy-KNN.首先定義兩箇樣本間的距離為相同屬性值的平均信息熵,此距離可通過重要屬性值有效度量樣本之間的相似程度,其次算法Entropy-KNN根據上述定義的距離選取與待測試樣本距離最小的K箇近鄰,最後根據各類近鄰樣本點的平均距離及箇數判斷待測試樣本的類彆.在蘑菇數據集上的實驗錶明,Entropy-KNN算法的分類準確率高于傳統KNN算法和距離加權KNN算法.
위료극복전통KNN산법,거리가권-KNN산법재거리정의급투표방식상적불족,제출료일충기우속성치대유별중요성적개진산법Entropy-KNN.수선정의량개양본간적거리위상동속성치적평균신식적,차거리가통과중요속성치유효도량양본지간적상사정도,기차산법Entropy-KNN근거상술정의적거리선취여대측시양본거리최소적K개근린,최후근거각류근린양본점적평균거리급개수판단대측시양본적유별.재마고수거집상적실험표명,Entropy-KNN산법적분류준학솔고우전통KNN산법화거리가권KNN산법.
In order to improve traditional KNN and KNN with weighted distance,which is on the distance definition and test mode,an improved algorithm entropy-KNN based on the classification importance of an attribute value is proposed.At first,a dis-tance of the two samples is defined as the average information entropy of the same attribute values.The distance can effectively measure the similarity degree of the two samples.Secondly,the Entrepy-KNN selects the K nearest neighbors by the distance above.Finally,the class label of the test sample is decided by the average distance and the numbers on the respective class.The experimental results on mushroom data set show this approach has much better than traditional KNN and KNN with weighted dis-tance.