计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2010年
9期
2013-2015,2019
,共4页
中文文本%文本分类%聚类算法%层次聚类%K-means
中文文本%文本分類%聚類算法%層次聚類%K-means
중문문본%문본분류%취류산법%층차취류%K-means
Chinese texts%text classification%clustering algorithm%hierarchical clustering%K-means
为了有效地提高丈本聚类的质量和效率,在对已有的层次聚类和K-means算法分析和研究的基础上,针对互联网信息处理量大、实时性高的特点,设计并实现了一种用于高维稀疏相似矩阵的文本聚类算法.该算法结合了层次聚类和K-means聚类的思想,根据一个阈值来控制聚类算法的选取和新簇的建立,并通过文本特征提取和文档相似度矩阵计算实现文本聚类.实验结果表明,该算法的召回率和正确率更高.
為瞭有效地提高丈本聚類的質量和效率,在對已有的層次聚類和K-means算法分析和研究的基礎上,針對互聯網信息處理量大、實時性高的特點,設計併實現瞭一種用于高維稀疏相似矩陣的文本聚類算法.該算法結閤瞭層次聚類和K-means聚類的思想,根據一箇閾值來控製聚類算法的選取和新簇的建立,併通過文本特徵提取和文檔相似度矩陣計算實現文本聚類.實驗結果錶明,該算法的召迴率和正確率更高.
위료유효지제고장본취류적질량화효솔,재대이유적층차취류화K-means산법분석화연구적기출상,침대호련망신식처리량대、실시성고적특점,설계병실현료일충용우고유희소상사구진적문본취류산법.해산법결합료층차취류화K-means취류적사상,근거일개역치래공제취류산법적선취화신족적건립,병통과문본특정제취화문당상사도구진계산실현문본취류.실험결과표명,해산법적소회솔화정학솔경고.
To improve the quality and efficiency of text clustering effectively,based on the analysis and research of the hierarchical clustering and k-means algorithms,a kind of text clustering algorithm for a higher-dimensional sparse matrix is designed and implemented for the characteristic of large quantity of internet information and high real-time.The algorithm combines the ideas of the hierarchical clustering and K-means clustering,which controls the selection of clustering algorithm and the establishment of new clusters through a threshold and realizes text clustering through extraction of text feature and calculation of text similarity matrix.Experiments showed that the accuracy and recall rate of this algorithm are higher.