计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2015年
4期
1051-1057
,共7页
文本聚类%权重因子%特征向量%遗传K-均值%遗传控制因子
文本聚類%權重因子%特徵嚮量%遺傳K-均值%遺傳控製因子
문본취류%권중인자%특정향량%유전K-균치%유전공제인자
text clustering%weighting factor%feature vector%genetic K-means%genetic control factor
为解决特征词权重表示文本时存在的局限性和遗传 K-均值算子操作的低效性问题,提出一种包含文本预处理和改进算法的文本聚类方法。根据权重因子和特征向量进行文本预处理,更好体现文本间的差异性,通过遗传控制因子控制个体的交叉和变异,对交叉和变异概率采用自适应控制,确保优质个体顺利进入到下一代种群,体现遗传算法的全局优化能力和 K-均值算法的高效局部搜索能力。实验结果表明,该方法使特征词分类精度得到提高,改善了文本聚类效果。
為解決特徵詞權重錶示文本時存在的跼限性和遺傳 K-均值算子操作的低效性問題,提齣一種包含文本預處理和改進算法的文本聚類方法。根據權重因子和特徵嚮量進行文本預處理,更好體現文本間的差異性,通過遺傳控製因子控製箇體的交扠和變異,對交扠和變異概率採用自適應控製,確保優質箇體順利進入到下一代種群,體現遺傳算法的全跼優化能力和 K-均值算法的高效跼部搜索能力。實驗結果錶明,該方法使特徵詞分類精度得到提高,改善瞭文本聚類效果。
위해결특정사권중표시문본시존재적국한성화유전 K-균치산자조작적저효성문제,제출일충포함문본예처리화개진산법적문본취류방법。근거권중인자화특정향량진행문본예처리,경호체현문본간적차이성,통과유전공제인자공제개체적교차화변이,대교차화변이개솔채용자괄응공제,학보우질개체순리진입도하일대충군,체현유전산법적전국우화능력화 K-균치산법적고효국부수색능력。실험결과표명,해방법사특정사분류정도득도제고,개선료문본취류효과。
To solve the problem of the limitation of feature word weight expressing the text and the inefficiency of genetic K-means,a text clustering method including text preprocessing and the improved algorithm was presented.According to the weight factor and feature vector,texts were preprocessed,which reflected the texts’diversities.On this basis,the genetic control fac-tor was used to control individuals in the crossover and mutation operation and the adaptive control was carried out for crossover and mutation probabilities,individuals with high qualities were concluded in the next generation easily.It took advantage of both global optimization of genetic algorithm and efficient local search of K-means algorithm.Experimental results show that the re-search can enhance the classification accuracy of feature words,and improve the clustering effect.