江西师范大学学报(自然科学版)
江西師範大學學報(自然科學版)
강서사범대학학보(자연과학판)
JOURNAL OF JIANGXI NORMAL UNIVERSITY(NATURAL SCIENCES EDITION)
2015年
3期
297-303,314
,共8页
万韩永%左家莉%万剑怡%王明文
萬韓永%左傢莉%萬劍怡%王明文
만한영%좌가리%만검이%왕명문
文本分类%KNN%样本重要性原理%SI-KNN
文本分類%KNN%樣本重要性原理%SI-KNN
문본분류%KNN%양본중요성원리%SI-KNN
text classification%KNN%sample importance principals%SI-KNN
KNN是重要数据挖掘算法之一,具有良好的文本分类性能.传统的KNN方法对所有样本权重看作相同,而忽略了不同样本对于分类贡献的不同.为了解决该个问题,提出了一种样本重要性原理,并在此基础上构造KNN分类器.应用随机游走算法识别类边界点,并计算出每个样本点的边界值,生成每个样本点的重要性得分,将样本重要性与KNN方法融合形成一种新的分类模型———SI-KNN.在中英文文本语料上的实验表明:改进的SI-KNN分类模型相比于传统的KNN方法有一定的提高.
KNN是重要數據挖掘算法之一,具有良好的文本分類性能.傳統的KNN方法對所有樣本權重看作相同,而忽略瞭不同樣本對于分類貢獻的不同.為瞭解決該箇問題,提齣瞭一種樣本重要性原理,併在此基礎上構造KNN分類器.應用隨機遊走算法識彆類邊界點,併計算齣每箇樣本點的邊界值,生成每箇樣本點的重要性得分,將樣本重要性與KNN方法融閤形成一種新的分類模型———SI-KNN.在中英文文本語料上的實驗錶明:改進的SI-KNN分類模型相比于傳統的KNN方法有一定的提高.
KNN시중요수거알굴산법지일,구유량호적문본분류성능.전통적KNN방법대소유양본권중간작상동,이홀략료불동양본대우분류공헌적불동.위료해결해개문제,제출료일충양본중요성원리,병재차기출상구조KNN분류기.응용수궤유주산법식별류변계점,병계산출매개양본점적변계치,생성매개양본점적중요성득분,장양본중요성여KNN방법융합형성일충신적분류모형———SI-KNN.재중영문문본어료상적실험표명:개진적SI-KNN분류모형상비우전통적KNN방법유일정적제고.
As one of the top ten data mining algorithms,KNN has good performance of text classification. All samples are treated as the same as its weight in the traditional KNN method,but the question that the different sample has the different contribution to the classification has been ignored. To solve the problem,a sample importance principals and KNN classifier constructed on the basis of this principle has been presented. Using the random walk algorithm to identify these samples near the class boundary,and calculate the boundary value of each sample. To generate the score of sample importance of each sample from the boundary value,combined sample importance with KNN method to form a new classification model. Experimental results show that the new SI-KNN classifier has some improvement compared to the traditional KNN method on the Chinese and English text corpus.