山东大学学报(工学版)
山東大學學報(工學版)
산동대학학보(공학판)
JOURNAL OF SHANDONG UNIVERSITY(ENGINEERING SCIENCE)
2014年
3期
36-40
,共5页
微博情感词典%网络用语%情感分析%上下文熵%朴素贝叶斯
微博情感詞典%網絡用語%情感分析%上下文熵%樸素貝葉斯
미박정감사전%망락용어%정감분석%상하문적%박소패협사
microblog sentiment lexicon%network languages%sentiment analysis%context entropy%naive Bayesian
提出了一种中文微博情感词典构建方法。采用上下文熵的网络用语发现策略,通过 TF-IDF(term frequency-inverse document frequency)进行二次过滤得到网络用语;利用 SO-PMI(semantic orientation-pointwise mutual infor-mation)算法在已标注的微博语料库中计算网络用语的情感倾向值,构建网络用语情感词典;将词典应用到微博情感分类实验,并与朴素贝叶斯分类器的分类性能进行了比较分析。实验结果表明,直接利用微博情感词典的分类效果好于朴素贝叶斯分类器,并具有分类过程简单、快速等优势。
提齣瞭一種中文微博情感詞典構建方法。採用上下文熵的網絡用語髮現策略,通過 TF-IDF(term frequency-inverse document frequency)進行二次過濾得到網絡用語;利用 SO-PMI(semantic orientation-pointwise mutual infor-mation)算法在已標註的微博語料庫中計算網絡用語的情感傾嚮值,構建網絡用語情感詞典;將詞典應用到微博情感分類實驗,併與樸素貝葉斯分類器的分類性能進行瞭比較分析。實驗結果錶明,直接利用微博情感詞典的分類效果好于樸素貝葉斯分類器,併具有分類過程簡單、快速等優勢。
제출료일충중문미박정감사전구건방법。채용상하문적적망락용어발현책략,통과 TF-IDF(term frequency-inverse document frequency)진행이차과려득도망락용어;이용 SO-PMI(semantic orientation-pointwise mutual infor-mation)산법재이표주적미박어료고중계산망락용어적정감경향치,구건망락용어정감사전;장사전응용도미박정감분류실험,병여박소패협사분류기적분류성능진행료비교분석。실험결과표명,직접이용미박정감사전적분류효과호우박소패협사분류기,병구유분류과정간단、쾌속등우세。
A method of building Chinese microblog sentiment lexicon was proposed,which adopted the discovery strate-gies of context entropy for network language, acquired network languages from the secondary filtration by TF-IDF and computed the sentiment weights of network language by SO-PMI algorithm in the labeled corpus.The built lexicon was applied into the analysis experiments of micro-blog sentiment,which was compared with that of naive bayesian classifi-er.Experiment results showed that the efficacy of classification by the built micro-blog sentimental lexicon was better than that by naive bayesian classifier,and was simple and rapid in the classification process.