科技通报
科技通報
과기통보
BULLETIN OF SCIENCE AND TECHNOLOGY
2015年
6期
139-141
,共3页
多因素方差分析%文本向量%特征挖掘
多因素方差分析%文本嚮量%特徵挖掘
다인소방차분석%문본향량%특정알굴
multi factor variance analysis%text vector%feature mining
文本向量特征挖掘应用于信息资源组织和管理领域,在大数据挖掘领域具有较大应用价值,传统算法精度不好。提出一种基于多因素方差分析的文本向量特征挖掘算法。使用多因素方差分析方法得到多种语料库的特征挖掘规律,结合蚁群算法,根据蚁群适应度概率正则训练迁移法则,得到种群进化最近时刻获得的数据集有效特征概率最大值,基于最优划分的K-means初始聚类中心选取算法,先对数据样本进行划分,然后根据样本分布特点来确定初始聚类中心,提高文本特征挖掘性能。仿真结果表明,该算法提高了文本向量特征的聚类效果,进而提高了特征挖掘性能,具有较高的数据特征召回率和检测率,时间耗时较少,在数据挖掘等领域应用价值较大。
文本嚮量特徵挖掘應用于信息資源組織和管理領域,在大數據挖掘領域具有較大應用價值,傳統算法精度不好。提齣一種基于多因素方差分析的文本嚮量特徵挖掘算法。使用多因素方差分析方法得到多種語料庫的特徵挖掘規律,結閤蟻群算法,根據蟻群適應度概率正則訓練遷移法則,得到種群進化最近時刻穫得的數據集有效特徵概率最大值,基于最優劃分的K-means初始聚類中心選取算法,先對數據樣本進行劃分,然後根據樣本分佈特點來確定初始聚類中心,提高文本特徵挖掘性能。倣真結果錶明,該算法提高瞭文本嚮量特徵的聚類效果,進而提高瞭特徵挖掘性能,具有較高的數據特徵召迴率和檢測率,時間耗時較少,在數據挖掘等領域應用價值較大。
문본향량특정알굴응용우신식자원조직화관리영역,재대수거알굴영역구유교대응용개치,전통산법정도불호。제출일충기우다인소방차분석적문본향량특정알굴산법。사용다인소방차분석방법득도다충어료고적특정알굴규률,결합의군산법,근거의군괄응도개솔정칙훈련천이법칙,득도충군진화최근시각획득적수거집유효특정개솔최대치,기우최우화분적K-means초시취류중심선취산법,선대수거양본진행화분,연후근거양본분포특점래학정초시취류중심,제고문본특정알굴성능。방진결과표명,해산법제고료문본향량특정적취류효과,진이제고료특정알굴성능,구유교고적수거특정소회솔화검측솔,시간모시교소,재수거알굴등영역응용개치교대。
The text feature vector mining applied to information resources organization and management field, in the field of data mining and has great application value, characteristic vector of traditional text mining algorithm using K-means algo?rithm , the accuracy is not good. A new method based on multi factor variance analysis of the characteristics of mining algo?rithm of text vector. The features used multi factor variance analysis method to obtain a variety of corpora mining rules, based on ant colony algorithm, based on ant colony fitness probability regular training transfer rule, get the evolution of pop?ulation of recent data sets obtained effective moment features the maximum probability, the algorithm selects K-means ini?tial clustering center based on optimized division, first division of the sample data, then according to the sample distribu?tion characteristics to determine the initial cluster center, improve the performance of text feature mining, the simulation re?sults show that, this algorithm improves the clustering effect of the text feature vectors, and then improve the performance of feature mining, data feature has higher recall rate and detection rate, time consuming less, greater in the application of data mining in areas such as value.