中国科技资源导刊
中國科技資源導刊
중국과기자원도간
CHINA SCIENCE & TECHNOLOGY RESOURCES REVIEW
2014年
4期
86-93
,共8页
子句对齐%词对齐%简单子句%专利文献%统计机器翻译
子句對齊%詞對齊%簡單子句%專利文獻%統計機器翻譯
자구대제%사대제%간단자구%전리문헌%통계궤기번역
sub-sentence alignment%word alignment%simple sentence%patent text%statistical machine translation
针对专利文献句子偏长的特点,将统计机器翻译中的训练语料进行子句切割获取双语的子句序列,再采用统计和规则相结合的策略来生成子句对齐,建立基于简单子句的双语语料来重新训练统计机器翻译系统,在一定程度上改善了原有双语训练语料中的短语对齐和词对齐,可以更为深入地利用平行语料中蕴含的翻译信息,应用于专利统计机器翻译中,在NTCIR-9的测试集上进行实验比较,获得较为满意的翻译效果。
針對專利文獻句子偏長的特點,將統計機器翻譯中的訓練語料進行子句切割穫取雙語的子句序列,再採用統計和規則相結閤的策略來生成子句對齊,建立基于簡單子句的雙語語料來重新訓練統計機器翻譯繫統,在一定程度上改善瞭原有雙語訓練語料中的短語對齊和詞對齊,可以更為深入地利用平行語料中蘊含的翻譯信息,應用于專利統計機器翻譯中,在NTCIR-9的測試集上進行實驗比較,穫得較為滿意的翻譯效果。
침대전리문헌구자편장적특점,장통계궤기번역중적훈련어료진행자구절할획취쌍어적자구서렬,재채용통계화규칙상결합적책략래생성자구대제,건립기우간단자구적쌍어어료래중신훈련통계궤기번역계통,재일정정도상개선료원유쌍어훈련어료중적단어대제화사대제,가이경위심입지이용평행어료중온함적번역신식,응용우전리통계궤기번역중,재NTCIR-9적측시집상진행실험비교,획득교위만의적번역효과。
For sentences in patent documents are otfen long, this paper tries to segment the training corpus of statistical machine translation into bilingual sub-sentence lists and uses statistical strategies and rules to obtain their sub-sentence alignment. Then new-generated training corpus based on simple sub-sentences is added into the training data to train statistical machine translation system. hTis method improves phrase alignment and word alignment in bilingual training corpus. It also digs translation information in parallel corpus more deeply and improves translation quality. hTis method was applied to statistical patent machine translation. Experiments were conducted on the test set in NTCIR-9 and a satisfactory translation result was obtained.