管理工程学报
管理工程學報
관리공정학보
Journal of Industrial Engineering and Engineering Management
2014年
4期
180~186
,共null页
情感分析 固定搭配特征提取 互信息与平均互信息 粗糙集 支持向量机
情感分析 固定搭配特徵提取 互信息與平均互信息 粗糙集 支持嚮量機
정감분석 고정탑배특정제취 호신식여평균호신식 조조집 지지향량궤
sentiment analysis; regular collocation features extraction ; mutual information and average mutual information; rough sets; support vector machine
有效和稳定的特征提取和特征表示是提高在线评论情感分析性能的重要因素.在常规的连续词袋性、触及对等特征的基础上,本文研究在线评论中固定搭配特征的提取与表示方法,提出结合互信息和平均互信息、基于粗糙集两种策略用于固定搭配特征提取,并从特征抽取方法的有效性和稳定性分析出发考查所抽取的固定搭配其内部及外部稳定性,并将经筛选的固定搭配特征融合于多种情感分析模型中进行情感分析.真实酒店评论数据上的实验表明,固定搭配特征的恰当表示和筛选有效改善情感分析模型的分类精度,此外研究发现评论中情感特征词分布不均衡情况下采用可变精度粗规则的提取策略有助于提高情感分析的分类精度.
有效和穩定的特徵提取和特徵錶示是提高在線評論情感分析性能的重要因素.在常規的連續詞袋性、觸及對等特徵的基礎上,本文研究在線評論中固定搭配特徵的提取與錶示方法,提齣結閤互信息和平均互信息、基于粗糙集兩種策略用于固定搭配特徵提取,併從特徵抽取方法的有效性和穩定性分析齣髮攷查所抽取的固定搭配其內部及外部穩定性,併將經篩選的固定搭配特徵融閤于多種情感分析模型中進行情感分析.真實酒店評論數據上的實驗錶明,固定搭配特徵的恰噹錶示和篩選有效改善情感分析模型的分類精度,此外研究髮現評論中情感特徵詞分佈不均衡情況下採用可變精度粗規則的提取策略有助于提高情感分析的分類精度.
유효화은정적특정제취화특정표시시제고재선평론정감분석성능적중요인소.재상규적련속사대성、촉급대등특정적기출상,본문연구재선평론중고정탑배특정적제취여표시방법,제출결합호신식화평균호신식、기우조조집량충책략용우고정탑배특정제취,병종특정추취방법적유효성화은정성분석출발고사소추취적고정탑배기내부급외부은정성,병장경사선적고정탑배특정융합우다충정감분석모형중진행정감분석.진실주점평론수거상적실험표명,고정탑배특정적흡당표시화사선유효개선정감분석모형적분류정도,차외연구발현평론중정감특정사분포불균형정황하채용가변정도조규칙적제취책략유조우제고정감분석적분류정도.
Precise sentiment orientation classification models and the extraction of effective and stable features from the review context are two essential factors which can affect the pedormance of online review sentiment analysis.Among various complicated features due to language complexity,regular collocation features are found to play important roles in that their structured expressions and show great impact on the sentiment orientation aside from conventional word bag and trigger pair features.In order to extract the complicated features for online reviews sentiment analysis,two novel approaches are presented in this paper to capture effectively the regular collocation features from the review of corpora-mutual information and average mutual information combined.Regular collocation features extracted are incorporated into sentiment analysis models as inputs to implementing the review sentiment analysis.The experiment on real hotel online reviews achieve generally higher precision,improves the performance of SVM models by 0.34% and that of the Na'fve Bayes models by 1.27%,respectively.As for the extraction of regular collocation features,two aspects were considered as essential to expressing effectively the complicated constraint of the review sentiment orientation from (1) internal stability of the regular collocation structure,which accounts for the substantial existence of the regular collocation aside from traditional word bags or trigger pairs,and (2) external effectiveness of the regular collocations which accounts for the contribution to the sentiment orientation classification.The mutual information method used in this paper measures external effectiveness while the average mutual information computation and its filtering performs the measurement of internal stability of regular collocations.The rough set based method ensures the internal stability and external effectiveness by α approximation rough rule extraction strategy and a maximum likelihood estimate of the regular collocations distribution.On the implementation,the approach presented has the non-uniform distribution occurrence of the sentiment features within the review.Variable precision strategies on the rough sets approach was introduced instead of the original rough rule strategy.It was found in the experiments that variable precision strategies on the rough sets approach did achieve the best sentiment analysis performance 88.38% via SVM models by the threshold value 0.85.Those results show that in dealing with the online review with non-uniform distribution occurrence of sentiment features.The variable precision strategy avoids the true voice of the minority and helps discriminate the whole sentiment orientation of the review.When dealing with the online review with uniform distribution occurrence of the sentiment features,α approximation would be a better choice to replace the original maximum likelihood estimate in the pursuit of a better sentiment analysis.A combination of mutual information and average mutual information approach would also be an optional strategy in the pursuit of comparative performance but with less computation under the same condition.