大连理工大学学报
大連理工大學學報
대련리공대학학보
JOURNAL OF DALIAN UNIVERSITY OF TECHNOLOGY
2014年
1期
91-99
,共9页
曹井香%黄德根%王伟%王帅军
曹井香%黃德根%王偉%王帥軍
조정향%황덕근%왕위%왕수군
短语依存树库%机器翻译%节点对齐%句法功能%语义角色
短語依存樹庫%機器翻譯%節點對齊%句法功能%語義角色
단어의존수고%궤기번역%절점대제%구법공능%어의각색
phrase dependency treebank%machine translation%node alignment%syntactic function%semantic roles
、旅游景点介绍等,但这些译文大多是一个门面的装饰,译文质量参差不齐,不能作为翻译学习的样本.PCEDT专门把宾州英文树库翻译成捷克语而没有采用现有译文语料,考虑的原因之一就是能够收集到的翻译文本翻译太自由,意译和编译普遍,很难实现深度的平行.而政府文件的官方翻译是要向外界传达国家重要信息的,翻译质量很高.这些文本是译员学习的样本,也应该作为机译学习的样本,以提高机译的质量.因此,本文尝试利用人译的思路深度加工这些双语文本,并实现最大程度的对齐,为机器翻译研究构建一个高质量的学习和评测语料库.
、旅遊景點介紹等,但這些譯文大多是一箇門麵的裝飾,譯文質量參差不齊,不能作為翻譯學習的樣本.PCEDT專門把賓州英文樹庫翻譯成捷剋語而沒有採用現有譯文語料,攷慮的原因之一就是能夠收集到的翻譯文本翻譯太自由,意譯和編譯普遍,很難實現深度的平行.而政府文件的官方翻譯是要嚮外界傳達國傢重要信息的,翻譯質量很高.這些文本是譯員學習的樣本,也應該作為機譯學習的樣本,以提高機譯的質量.因此,本文嘗試利用人譯的思路深度加工這些雙語文本,併實現最大程度的對齊,為機器翻譯研究構建一箇高質量的學習和評測語料庫.
、여유경점개소등,단저사역문대다시일개문면적장식,역문질량삼차불제,불능작위번역학습적양본.PCEDT전문파빈주영문수고번역성첩극어이몰유채용현유역문어료,고필적원인지일취시능구수집도적번역문본번역태자유,의역화편역보편,흔난실현심도적평행.이정부문건적관방번역시요향외계전체국가중요신식적,번역질량흔고.저사문본시역원학습적양본,야응해작위궤역학습적양본,이제고궤역적질량.인차,본문상시이용인역적사로심도가공저사쌍어문본,병실현최대정도적대제,위궤기번역연구구건일개고질량적학습화평측어료고.
A phrase dependency treebank ( PDT ) integrating phrase structure grammar and dependency grammar is proposed and elaborated to cater for translation studies .The construction of DU T Parallel Chinese-English PDT (DU T-CEPDT ) is reported .PDT favors flat structures and the dependency is based on semantics rather than syntactic functions ,w hich differs from the mainstream dependency analysis that favors binary branching . The raw texts of DUT-CEPDT are Chinese government work reports and White Papers and their official English translation .First of all ,after word segmentation and part of speech (POS) tagging ,Chinese PDT and English PDT are constructed manually with the aid of LingTreeConstructor ,a tool tailored for linguists .Then ,node alignment , which covers translation alignments of words ,phrases ,clauses up to the whole passage ,is proposed instead of traditional word or sentence alignment to provide more translation knowledge . Lastly , semantic roles based on the FrameNet are labeled simultaneously on the aligned nodes of the English and Chinese trees .DU T-CEPDT can serve as a resource and standard of the training and assessment of both human translators and machine translation systems .