软件学报
軟件學報
연건학보
JOURNAL OF SOFTWARE
2014年
12期
2790-2807
,共18页
张林%钱冠群%樊卫国%华琨%张莉
張林%錢冠群%樊衛國%華琨%張莉
장림%전관군%번위국%화곤%장리
情感分析%用户评论%短文本%意见挖掘
情感分析%用戶評論%短文本%意見挖掘
정감분석%용호평론%단문본%의견알굴
sentiment analysis%user review%short-text%opinion mining
以在智能移动设备上发表的用户评论作为研究对象,并将该类评论称为轻型评论。指出了轻型评论与早期互联网评论及短文本研究的异同点,并通过实验总结轻型评论的独有特性:字数少、跨度大,短小评论数量众多,评论长度与数量满足幂率分布。同时,针对轻型评论的情感分类研究展开了一系列的实验研究,发现:(1)情感分类效果随着评论长度的增加而下降;(2)传统的特征筛选方法以及特征加权方法对于轻型评论效果都不够理想;(3)极性词在短评论中比例高于长评论;(4)长、短评论在用词上存在较高的重叠度。在此基础上,提出了一种基于短评论特征共现的特征筛选方法,将短小评论中的优势信息和传统的特征筛选方法相结合,在筛选掉无用噪音的同时增补有利于分类的有效特征。实验结果表明,该方法可以有效地提高轻型评论中较长评论的分类效果。
以在智能移動設備上髮錶的用戶評論作為研究對象,併將該類評論稱為輕型評論。指齣瞭輕型評論與早期互聯網評論及短文本研究的異同點,併通過實驗總結輕型評論的獨有特性:字數少、跨度大,短小評論數量衆多,評論長度與數量滿足冪率分佈。同時,針對輕型評論的情感分類研究展開瞭一繫列的實驗研究,髮現:(1)情感分類效果隨著評論長度的增加而下降;(2)傳統的特徵篩選方法以及特徵加權方法對于輕型評論效果都不夠理想;(3)極性詞在短評論中比例高于長評論;(4)長、短評論在用詞上存在較高的重疊度。在此基礎上,提齣瞭一種基于短評論特徵共現的特徵篩選方法,將短小評論中的優勢信息和傳統的特徵篩選方法相結閤,在篩選掉無用譟音的同時增補有利于分類的有效特徵。實驗結果錶明,該方法可以有效地提高輕型評論中較長評論的分類效果。
이재지능이동설비상발표적용호평론작위연구대상,병장해류평론칭위경형평론。지출료경형평론여조기호련망평론급단문본연구적이동점,병통과실험총결경형평론적독유특성:자수소、과도대,단소평론수량음다,평론장도여수량만족멱솔분포。동시,침대경형평론적정감분류연구전개료일계렬적실험연구,발현:(1)정감분류효과수착평론장도적증가이하강;(2)전통적특정사선방법이급특정가권방법대우경형평론효과도불구이상;(3)겁성사재단평론중비례고우장평론;(4)장、단평론재용사상존재교고적중첩도。재차기출상,제출료일충기우단평론특정공현적특정사선방법,장단소평론중적우세신식화전통적특정사선방법상결합,재사선도무용조음적동시증보유리우분류적유효특정。실험결과표명,해방법가이유효지제고경형평론중교장평론적분류효과。
This paper researches the newly emerging user reviews (referred here as “light reviews”) generated from smart mobile devices. The similarities and differences between this research and the early studies are pointed out. The unique characteristics of the light review can be summarized as having shorter texts, bigger span, and in most cases fewer words per review. The review length and scale also meet the power-law distribution. A series of experiments are studies based on light reviews, resulting in some interesting findings: (1) There is an inverse relationship between classification accuracy and review length; (2) The traditional classical feature selection and feature weight method do not perform well enough on light reviews; (3) The polar word ratio in short reviews, which is the most important feature in sentiment analysis, is higher than in long reviews; (4) There is a higher shared feature term proportion between short review and long review. Based on above studies, the paper puts forward a feature selection method based on short text co-occurrence feature. By combining the information advantages in short reviews with the traditional feature selection methods, the presented method preserves useful information and details as much as possible while removing noise. The results of experiment show that the method is effective and the classification rate is higher.