井冈山大学学报(自然科学版)
井岡山大學學報(自然科學版)
정강산대학학보(자연과학판)
JOURNAL OF JINGGANGSHAN UNIVERSITY(SCIENCE AND TECHNOLOGY)
2013年
3期
41-44
,共4页
谢力%李光耀%谭云兰
謝力%李光耀%譚雲蘭
사력%리광요%담운란
互信息%特征选择%词频%文本类别%MIFC
互信息%特徵選擇%詞頻%文本類彆%MIFC
호신식%특정선택%사빈%문본유별%MIFC
mutual information%feature selection%word frequency%text category%MIFC
分析了传统的互信息特征选择算法的不足,针对可能赋予低频特征词过高权重的问题,利用词频、集中度这两个强信息特征指标对算法进行改进,提出了一种基于词频和文本类别的互信息改进算法(Improved Mutual Information Algorithm based on Word Frequency and Text Category,简称改进的MIFC)。实验结果表明,改进的MIFC算法提取的特征空间比传统的互信息算法有更高的精确度。
分析瞭傳統的互信息特徵選擇算法的不足,針對可能賦予低頻特徵詞過高權重的問題,利用詞頻、集中度這兩箇彊信息特徵指標對算法進行改進,提齣瞭一種基于詞頻和文本類彆的互信息改進算法(Improved Mutual Information Algorithm based on Word Frequency and Text Category,簡稱改進的MIFC)。實驗結果錶明,改進的MIFC算法提取的特徵空間比傳統的互信息算法有更高的精確度。
분석료전통적호신식특정선택산법적불족,침대가능부여저빈특정사과고권중적문제,이용사빈、집중도저량개강신식특정지표대산법진행개진,제출료일충기우사빈화문본유별적호신식개진산법(Improved Mutual Information Algorithm based on Word Frequency and Text Category,간칭개진적MIFC)。실험결과표명,개진적MIFC산법제취적특정공간비전통적호신식산법유경고적정학도。
This paper analyzes the shortages of Mutual Information (MI) algorithm. Aiming at the problem that low frequency features may have higher weights, we take advantage of two indexes of strong informational features–word frequency and concentration ratio and propose an improved MI algorithm based on word frequency and text category (MIFC). The result of the experiment shows that MIFC algorithm has greater accuracy than traditional MI algorithm.