中文信息学报
中文信息學報
중문신식학보
JOURNAL OF CHINESE INFORMAITON PROCESSING
2010年
1期
77-83
,共7页
吴琼%谭松波%张刚%段洣毅%程学旗
吳瓊%譚鬆波%張剛%段洣毅%程學旂
오경%담송파%장강%단미의%정학기
计算机应用%中文信息处理%跨领域%倾向性分析%图排序%EM算法
計算機應用%中文信息處理%跨領域%傾嚮性分析%圖排序%EM算法
계산궤응용%중문신식처리%과영역%경향성분석%도배서%EM산법
computer application%Chinese information processing%cross domain%opinion analysis%graph ranking%EM algorithm
该文主要研究文本的倾向性分析问题,即判断文本中的论断是正面还是负面的.已有的研究表明,监督分类方法对倾向性分析很有效.但是,多数情况下,已有的标注数据与待判断倾向性的数据不属于同一个领域,此时监督分类算法的性能明显下降.为解决此问题,该文提出一个算法,将文本的情感倾向性与图排序算法结合起来进行跨领域倾向性分析,该算法在图排序算法基础上,利用训练域文本的准确标签与测试域文本的伪标签来迭代进行倾向性分析.得到迭代最终结果后,为充分利用其中倾向性判断较为准确的测试文本来提高整个测试集倾向性分析的精度,将这些较准确的测试文本作为"种子",进一步通过EM算法迭代进行跨领域倾向性分析.实验结果表明,该文提出的方法能大幅度提高跨领域倾向性分析的精度.
該文主要研究文本的傾嚮性分析問題,即判斷文本中的論斷是正麵還是負麵的.已有的研究錶明,鑑督分類方法對傾嚮性分析很有效.但是,多數情況下,已有的標註數據與待判斷傾嚮性的數據不屬于同一箇領域,此時鑑督分類算法的性能明顯下降.為解決此問題,該文提齣一箇算法,將文本的情感傾嚮性與圖排序算法結閤起來進行跨領域傾嚮性分析,該算法在圖排序算法基礎上,利用訓練域文本的準確標籤與測試域文本的偽標籤來迭代進行傾嚮性分析.得到迭代最終結果後,為充分利用其中傾嚮性判斷較為準確的測試文本來提高整箇測試集傾嚮性分析的精度,將這些較準確的測試文本作為"種子",進一步通過EM算法迭代進行跨領域傾嚮性分析.實驗結果錶明,該文提齣的方法能大幅度提高跨領域傾嚮性分析的精度.
해문주요연구문본적경향성분석문제,즉판단문본중적론단시정면환시부면적.이유적연구표명,감독분류방법대경향성분석흔유효.단시,다수정황하,이유적표주수거여대판단경향성적수거불속우동일개영역,차시감독분류산법적성능명현하강.위해결차문제,해문제출일개산법,장문본적정감경향성여도배서산법결합기래진행과영역경향성분석,해산법재도배서산법기출상,이용훈련역문본적준학표첨여측시역문본적위표첨래질대진행경향성분석.득도질대최종결과후,위충분이용기중경향성판단교위준학적측시문본래제고정개측시집경향성분석적정도,장저사교준학적측시문본작위"충자",진일보통과EM산법질대진행과영역경향성분석.실험결과표명,해문제출적방법능대폭도제고과영역경향성분석적정도.
This paper focuses on the opinion analysis of documents, i.e. to determine the overall opinion (e.g., negative or positive) of a given document. Existing studies have shown that, the supervised classification approaches usually perform well in this task. However, in most cases, the performance decreases sharply when the model is transferred from the labeled data domain to a different target domain without labeled data. This raises the issue of cross-domain opinion analysis. In this paper, we propose an iterative algorithm which integrated the opinion orientations of the documents into the graph-ranking algorithm for cross-domain opinion analysis. We apply the graph-ranking algorithm using the accurate labels of old-domain documents as well as the "pseudo" labels of new-domain documents. Over the results of the iterative algorithm, we try to further improve the performance by choosing the test documents whose opinions have been determined more accurately as "seeds", and applying the EM algorithm again for cross-domain opinion analysis. The experiment results indicate that the proposed algorithm could improve the performance of cross-domain opinion analysis dramatically.