长春师范学院学报:自然科学版
長春師範學院學報:自然科學版
장춘사범학원학보:자연과학판
Journal of Changchun Teachers College
2011年
5期
29-33
,共5页
短文本%扩展特征关键词%知网%文本聚类%K—means
短文本%擴展特徵關鍵詞%知網%文本聚類%K—means
단문본%확전특정관건사%지망%문본취류%K—means
short text%feature keyword expansion%HowNet%text clustering%K - means algorithm
为了解决短文本因特征关键词稀疏而导致文本向量概念表达不够准确的问题,本文提出概念属性扩展特征关键词短文本聚类算法——锄BcFE(shon Text Clustering Based on Concept Feature Expansion)。该算法通过HowNet的概念属性扩展特征关键词,以此增加文本语义特征和反映文本主题的特征关键词数量,进而提高短文本相似性;将其应用于短文本聚类,能够提高短文本的聚类效果。实验结果表明,该算法在短文本聚类的查准率和查全率上都得到了较大的提高。
為瞭解決短文本因特徵關鍵詞稀疏而導緻文本嚮量概唸錶達不夠準確的問題,本文提齣概唸屬性擴展特徵關鍵詞短文本聚類算法——鋤BcFE(shon Text Clustering Based on Concept Feature Expansion)。該算法通過HowNet的概唸屬性擴展特徵關鍵詞,以此增加文本語義特徵和反映文本主題的特徵關鍵詞數量,進而提高短文本相似性;將其應用于短文本聚類,能夠提高短文本的聚類效果。實驗結果錶明,該算法在短文本聚類的查準率和查全率上都得到瞭較大的提高。
위료해결단문본인특정관건사희소이도치문본향량개념표체불구준학적문제,본문제출개념속성확전특정관건사단문본취류산법——서BcFE(shon Text Clustering Based on Concept Feature Expansion)。해산법통과HowNet적개념속성확전특정관건사,이차증가문본어의특정화반영문본주제적특정관건사수량,진이제고단문본상사성;장기응용우단문본취류,능구제고단문본적취류효과。실험결과표명,해산법재단문본취류적사준솔화사전솔상도득도료교대적제고。
In order to solve the inaccurate concept expression problem of text vector which is caused by sparse feature keywords in short text, this paper proposes short text clustering algorithm based on concept feature expansion. The algorithm expan(is feature keywords through adopting HowNet' s concept attributes. It not only adds the semantic features of the text and the number of feature keywords which reflect text topic,but also improves the similarity of the short text.It is used in short text clustering to increase the clustering effect of the text. Experimental results show that this algorithm has increased the precision ratio and recall ratio of the short text clustering.