计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2015年
8期
2297-2302
,共6页
麦合甫热提%麦热哈巴·艾力%米莉万·雪合来提
麥閤甫熱提%麥熱哈巴·艾力%米莉萬·雪閤來提
맥합보열제%맥열합파·애력%미리만·설합래제
词对齐%维汉机器翻译%维汉词对齐%词尾粒度%形态分析
詞對齊%維漢機器翻譯%維漢詞對齊%詞尾粒度%形態分析
사대제%유한궤기번역%유한사대제%사미립도%형태분석
word alignment%Uyghur-Chinese machine translation%Uyghur-Chinese word alignment%affixes granularity%mor-phological analysis
维吾尔语中,词的复杂形态是导致数据稀疏问题的主要原因,为降低数据稀疏对词对齐和机器翻译的不良影响,尽可能挖掘词尾携带的语义信息,提出对词尾采取“分离-丢弃”方案。根据统计分析,对维吾尔语词进行词干、词尾分离后,对其语义信息被明文翻译概率高的词尾采取“分离”方案,概率低的词尾采取“丢弃”方案。将该方案应用到维吾尔语名词和动词上,分等级构造9种模板进行实验,实验结果表明,该方案抑制了词干、词尾分离带来的句子长度过长问题,增加了维汉词对的数量,提高了维汉机器翻译质量,验证了该方案的有效性。
維吾爾語中,詞的複雜形態是導緻數據稀疏問題的主要原因,為降低數據稀疏對詞對齊和機器翻譯的不良影響,儘可能挖掘詞尾攜帶的語義信息,提齣對詞尾採取“分離-丟棄”方案。根據統計分析,對維吾爾語詞進行詞榦、詞尾分離後,對其語義信息被明文翻譯概率高的詞尾採取“分離”方案,概率低的詞尾採取“丟棄”方案。將該方案應用到維吾爾語名詞和動詞上,分等級構造9種模闆進行實驗,實驗結果錶明,該方案抑製瞭詞榦、詞尾分離帶來的句子長度過長問題,增加瞭維漢詞對的數量,提高瞭維漢機器翻譯質量,驗證瞭該方案的有效性。
유오이어중,사적복잡형태시도치수거희소문제적주요원인,위강저수거희소대사대제화궤기번역적불량영향,진가능알굴사미휴대적어의신식,제출대사미채취“분리-주기”방안。근거통계분석,대유오이어사진행사간、사미분리후,대기어의신식피명문번역개솔고적사미채취“분리”방안,개솔저적사미채취“주기”방안。장해방안응용도유오이어명사화동사상,분등급구조9충모판진행실험,실험결과표명,해방안억제료사간、사미분리대래적구자장도과장문제,증가료유한사대적수량,제고료유한궤기번역질량,험증료해방안적유효성。
The main reason leads to data sparseness is rich morphological forms of words in Uyghur.To reduce the negative effects of data sparseness on Uyghur-Chinese word alignment and machine translation,a separating-dropping method was presen-ted.According to the statistical analysis,the affixes with highly translated probability were separated from stem and the affixes with lower translated probability were dropped.This method was applied to two main word classes including noun and verb in Uyghur,and 9 models were constructed for experiments.Results of experiments show the proposed method controls the length of the sentence caused by separating stem and affixes,the number of Uyghur-Chinese word pairs is increased,the quality of Uy-ghur-Chinese machine translation is improved,and the efficiency of this method is verified.