集成技术
集成技術
집성기술
Journal of Integration Technology
2015年
3期
69-78
,共10页
章昉%颜华驹%刘明君%赵中英
章昉%顏華駒%劉明君%趙中英
장방%안화구%류명군%조중영
数据挖掘%短文本%分类%关联规则
數據挖掘%短文本%分類%關聯規則
수거알굴%단문본%분류%관련규칙
data mining%short text%classiifcation%association rules
以短文本为主体的微博等社交媒体,因具备文本短、特征稀疏等特性,使得传统文本分类方法不能够高精度地对短文本进行分类。针对这一问题,文章提出了基于词项关联的短文本分类方法。首先对训练集进行强关联规则挖掘,将强关联规则加入到短文本的特征中,提高短文本特征密度,进而提高短文本分类精度。对比实验表明,该方法一定程度上减缓了短文本特征稀疏特点对分类结果的影响,提高了分类准确率、召回率和F1值。
以短文本為主體的微博等社交媒體,因具備文本短、特徵稀疏等特性,使得傳統文本分類方法不能夠高精度地對短文本進行分類。針對這一問題,文章提齣瞭基于詞項關聯的短文本分類方法。首先對訓練集進行彊關聯規則挖掘,將彊關聯規則加入到短文本的特徵中,提高短文本特徵密度,進而提高短文本分類精度。對比實驗錶明,該方法一定程度上減緩瞭短文本特徵稀疏特點對分類結果的影響,提高瞭分類準確率、召迴率和F1值。
이단문본위주체적미박등사교매체,인구비문본단、특정희소등특성,사득전통문본분류방법불능구고정도지대단문본진행분류。침대저일문제,문장제출료기우사항관련적단문본분류방법。수선대훈련집진행강관련규칙알굴,장강관련규칙가입도단문본적특정중,제고단문본특정밀도,진이제고단문본분류정도。대비실험표명,해방법일정정도상감완료단문본특정희소특점대분류결과적영향,제고료분류준학솔、소회솔화F1치。
Due to its characteristics of shortness and sparseness, short text, as the main body of microblog and other social media, cannot be accurately classiifed by the traditional text classiifcation methods. To solve this problem, a method of short text classiifcation based on association rules of lexical items was proposed in this paper. Firstly, the training set based on the strong association rules was mined, and then the strong association rules was added to the features of short text so as to increase the feature density of short text, thereby to increase the accuracy of results of short text classiifcation. Comparative experiments show that this method, to some extent, reduces the impact of sparseness of short text on the classiifcation results, and it improves the classiifcation accuracy, recall values andF1 values.