计算机技术与发展
計算機技術與髮展
계산궤기술여발전
COMPUTER TECHNOLOGY AND DEVELOPMENT
2014年
6期
71-74
,共4页
数据挖掘%文本分类%KNN%关联分析
數據挖掘%文本分類%KNN%關聯分析
수거알굴%문본분류%KNN%관련분석
data mining%text classification%KNN%association analysis
KNN算法在数据挖掘的分支-文本分类中有重要的应用。在分析了传统KNN方法不足的基础上,提出了一种基于关联分析的KNN改进算法。该方法首先针对不同类别的训练文本提取每个类别的频繁特征集及其关联的文本,然后基于对各个类别文本的关联分析结果,为未知类别文本确定适当的近邻数k,并在已知类别的训练文本中快速选取k个近邻,进而根据近邻的类别确定未知文本的类别。相比于基于传统KNN的文本分类方法,改进方法能够较好地确定k值,并能降低时间复杂度。实验结果表明,文中提出的基于改进KNN的文本分类方法提高了文本分类的效率和准确率。
KNN算法在數據挖掘的分支-文本分類中有重要的應用。在分析瞭傳統KNN方法不足的基礎上,提齣瞭一種基于關聯分析的KNN改進算法。該方法首先針對不同類彆的訓練文本提取每箇類彆的頻繁特徵集及其關聯的文本,然後基于對各箇類彆文本的關聯分析結果,為未知類彆文本確定適噹的近鄰數k,併在已知類彆的訓練文本中快速選取k箇近鄰,進而根據近鄰的類彆確定未知文本的類彆。相比于基于傳統KNN的文本分類方法,改進方法能夠較好地確定k值,併能降低時間複雜度。實驗結果錶明,文中提齣的基于改進KNN的文本分類方法提高瞭文本分類的效率和準確率。
KNN산법재수거알굴적분지-문본분류중유중요적응용。재분석료전통KNN방법불족적기출상,제출료일충기우관련분석적KNN개진산법。해방법수선침대불동유별적훈련문본제취매개유별적빈번특정집급기관련적문본,연후기우대각개유별문본적관련분석결과,위미지유별문본학정괄당적근린수k,병재이지유별적훈련문본중쾌속선취k개근린,진이근거근린적유별학정미지문본적유별。상비우기우전통KNN적문본분류방법,개진방법능구교호지학정k치,병능강저시간복잡도。실험결과표명,문중제출적기우개진KNN적문본분류방법제고료문본분류적효솔화준학솔。
The KNN algorithm is largely applied in text classification,one branch of data mining. On the basis of analyzing the deficien-cies of the traditional KNN method,an improved KNN algorithm based on association analysis is proposed in this paper. In this method, frequent feature sets for each class of training documents and associated documents should be extracted in advance. When a document with unknown class is to be classified,by the use of the results of association analysis,the number of nearest neighbors,k can be decided,k nearest neighbors can be found quickly from all classes of training documents,and the class of the document can be decided by the classes of its neighbors. Compared with the traditional KNN algorithm,this method has greatly improved in the selection of the best number of nearest neighbors. Moreover,it can also reduce the time complexity of the algorithm. The experimental results show that the proposed al-gorithm has better efficiency and accuracy in text classification.