应用科学学报
應用科學學報
응용과학학보
JOURNAL OF APPLIED SCIENCES
2013年
2期
197-203
,共7页
文本分类%概念层次网络%概念%概念区分度%类别关联度
文本分類%概唸層次網絡%概唸%概唸區分度%類彆關聯度
문본분류%개념층차망락%개념%개념구분도%유별관련도
text categorization%hierarchical network of concepts%concept%concept discrimination%category relatedness
针对统计方法不能从语义理解的角度进行文本分类的问题,提出了利用概念层次网络概念知识进行文本分类的方法,包括两部分:依据概念进行特征选取以及根据类别关联度分类.在特征选取时,通过计算概念与类别的区分度挖掘出类别核心概念,并采用类别核心概念对特征项进行精选.依据类别核心概念相关的类别语义信息,提出了文档与类别关联度的计算方法,并根据类别关联度来判断文本类别.实验表明,该方法可有效降低特征空间维数,在提高分类效率的同时保证了分类效果,F1值略有提高.与SVM、KNN和Bayes分类器对比,当特征项数目较少时,该方法的F1值明显高于其他3种方法,综合分类效果与SVM相当,优于KNN和Bayes.
針對統計方法不能從語義理解的角度進行文本分類的問題,提齣瞭利用概唸層次網絡概唸知識進行文本分類的方法,包括兩部分:依據概唸進行特徵選取以及根據類彆關聯度分類.在特徵選取時,通過計算概唸與類彆的區分度挖掘齣類彆覈心概唸,併採用類彆覈心概唸對特徵項進行精選.依據類彆覈心概唸相關的類彆語義信息,提齣瞭文檔與類彆關聯度的計算方法,併根據類彆關聯度來判斷文本類彆.實驗錶明,該方法可有效降低特徵空間維數,在提高分類效率的同時保證瞭分類效果,F1值略有提高.與SVM、KNN和Bayes分類器對比,噹特徵項數目較少時,該方法的F1值明顯高于其他3種方法,綜閤分類效果與SVM相噹,優于KNN和Bayes.
침대통계방법불능종어의리해적각도진행문본분류적문제,제출료이용개념층차망락개념지식진행문본분류적방법,포괄량부분:의거개념진행특정선취이급근거유별관련도분류.재특정선취시,통과계산개념여유별적구분도알굴출유별핵심개념,병채용유별핵심개념대특정항진행정선.의거유별핵심개념상관적유별어의신식,제출료문당여유별관련도적계산방법,병근거유별관련도래판단문본유별.실험표명,해방법가유효강저특정공간유수,재제고분류효솔적동시보증료분류효과,F1치략유제고.여SVM、KNN화Bayes분류기대비,당특정항수목교소시,해방법적F1치명현고우기타3충방법,종합분류효과여SVM상당,우우KNN화Bayes.
@@@@To achieve semantic understanding, this paper proposes a method for text categorization based on concept-knowledge in the hierarchical network of concepts (HNC). The method includes two parts: feature selection using concepts and text categorization according to category relatedness degree. In this paper, cat-egory key concepts are explored by computing discrimination degree of concepts, and used to further reduce dimensionality of the feature space. Based on the category semantic information consisting of category key concepts and relatedness weights, the method of computing relatedness degrees between documents and cate-gories is proposed. The category relatedness degree of document is used as a measure for text categorization. Experiments show that the proposed method can effectively reduce dimensionality of feature space, increase efciency and ensure effectiveness of text categorization. Compared with SVM, KNN and Bayes, this method is the best in terms of F1 values at higher feature reduction levels. In terms of overall performance, the method is almost equivalent to SVM, and better than KNN and Bayes.