甘肃联合大学学报:自然科学版
甘肅聯閤大學學報:自然科學版
감숙연합대학학보:자연과학판
Journal of Gansu Lianhe University :Natural Sciences
2012年
1期
51-54
,共4页
樊东辉%王治和%陈建华%许虎寅
樊東輝%王治和%陳建華%許虎寅
번동휘%왕치화%진건화%허호인
特征选择%文档频%词频
特徵選擇%文檔頻%詞頻
특정선택%문당빈%사빈
feature selection%document frequency%word frequency
通过研究文本特征选取中权重的计算问题,提出了一种利用特征词的熵函数加权的权值的计算方法,不但考察了特征词的文档频数,而且考察了它们在文档中出现的次数,使选出的特征子集更具有较好的代表性.实验表明,改进后的算法对聚类结果有了一定的改进.
通過研究文本特徵選取中權重的計算問題,提齣瞭一種利用特徵詞的熵函數加權的權值的計算方法,不但攷察瞭特徵詞的文檔頻數,而且攷察瞭它們在文檔中齣現的次數,使選齣的特徵子集更具有較好的代錶性.實驗錶明,改進後的算法對聚類結果有瞭一定的改進.
통과연구문본특정선취중권중적계산문제,제출료일충이용특정사적적함수가권적권치적계산방법,불단고찰료특정사적문당빈수,이차고찰료타문재문당중출현적차수,사선출적특정자집경구유교호적대표성.실험표명,개진후적산법대취류결과유료일정적개진.
By studying the text feature selection in the weight calculation problem,a calculation method of the word entropy weighted was proposed.Not only examines the characteristics of the document frequency,but also examines them in a document the number of occurrences.This selected feature subset is more good representation.Experiments show that the improved algorithm for clustering results have certain improvements.