CAJ | 학술논문

由于受到翻译腔的影响，中英平行语料库存在固有的扭斜的语言模型。显然，用这样的语料库训练的机器翻译、跨语言检索等自然语言处理系统也承袭了扭斜的语言模型，严重影响到应用系统的性能。为了克服平行语料库固有的缺陷，提出构建和剖析中英三元组可比语料库的技术研究。这项研究采用可比语料库和语言自动剖析技术，使用统计和规则相结合的方法，对由本族英语、中式英语和标准中文三元素所组成的三元组可比语料库中的本族英语和中式英语进行统计分析。在此基础上，利用n-元词串、关键词簇等自动抽取技术挖掘基于本族语言模型的双语资源，实现改进和发展机器翻译等自然语言的处理应用。
유우수도번역강적영향，중영평행어료고존재고유적뉴사적어언모형。현연，용저양적어료고훈련적궤기번역、과어언검색등자연어언처리계통야승습료뉴사적어언모형，엄중영향도응용계통적성능。위료극복평행어료고고유적결함，제출구건화부석중영삼원조가비어료고적기술연구。저항연구채용가비어료고화어언자동부석기술，사용통계화규칙상결합적방법，대유본족영어、중식영어화표준중문삼원소소조성적삼원조가비어료고중적본족영어화중식영어진행통계분석。재차기출상，이용n-원사천、관건사족등자동추취기술알굴기우본족어언모형적쌍어자원，실현개진화발전궤기번역등자연어언적처리응용。
There exists inherent skewed language model in Chinese-English parallel corpus due to the influence of transla-tionese. Obviously, natural language processing systems trained with these corpora, including machine translation and cross-language information retrieval, will inherit the skewed language model, thus seriously degrading the performance of applications. To fix the inherent defaults in parallel corpus, this paper proposes a technical research on building and profiling Chinese-English 3-tuple comparable corpora. The study adopts comparable corpora and automatic language profiling technologies and applies a combined method of statistics and rules for statistical analysis on native English and Chinglish in 3-tuple comparable corpora that consists of native English, Chinglish and standard Chinese. Based on this, automatic extraction technologies, such as n-grams and key clusters, are used in the mining of native-language-based bilingual resources to improve and develop natural language processing applications such as machine translation.