中文信息学报
中文信息學報
중문신식학보
JOURNAL OF CHINESE INFORMAITON PROCESSING
2010年
1期
117-122
,共6页
人工智能%机器翻译%日汉机器翻译系统%日语分词%日语词性标注%联合分词
人工智能%機器翻譯%日漢機器翻譯繫統%日語分詞%日語詞性標註%聯閤分詞
인공지능%궤기번역%일한궤기번역계통%일어분사%일어사성표주%연합분사
artificial intelligence%machine translation%Japanese-Chinese machine translation system%Japanese word segmentation%Japanese POS tagging%joint word segmentation
日语分词和词性标注是以日语为源语言的机器翻译等自然语言处理工作的第一步.该文提出了一种基于规则和统计的日语分词和词性标注方法,使用基于单一感知器的联合分词和词性标注算法作为基本框架,在其中加入了基于规则的词语的邻接属性作为特征.在小规模测试集上的实验结果表明,这种方法分词的F值达到了98.2%,分词加词性标注的F值达到了94.8%.该文所采用的方法已经成功应用到日汉机器翻译系统中.
日語分詞和詞性標註是以日語為源語言的機器翻譯等自然語言處理工作的第一步.該文提齣瞭一種基于規則和統計的日語分詞和詞性標註方法,使用基于單一感知器的聯閤分詞和詞性標註算法作為基本框架,在其中加入瞭基于規則的詞語的鄰接屬性作為特徵.在小規模測試集上的實驗結果錶明,這種方法分詞的F值達到瞭98.2%,分詞加詞性標註的F值達到瞭94.8%.該文所採用的方法已經成功應用到日漢機器翻譯繫統中.
일어분사화사성표주시이일어위원어언적궤기번역등자연어언처리공작적제일보.해문제출료일충기우규칙화통계적일어분사화사성표주방법,사용기우단일감지기적연합분사화사성표주산법작위기본광가,재기중가입료기우규칙적사어적린접속성작위특정.재소규모측시집상적실험결과표명,저충방법분사적F치체도료98.2%,분사가사성표주적F치체도료94.8%.해문소채용적방법이경성공응용도일한궤기번역계통중.
Word segmentation and part-of-speech tagging is the first step of Japanese natural language processing tasks, such as machine translation in which Japanese is the source language. In this paper, a Japanese word segmentation and POS tagging approach based on rules and statistics is proposed. Adopting a single perceptron based joint word segmentation and POS tagging algorithm as the basic framework, this method is combined with the features of adjacency attributes which are derived by heuristic rules. The experiment on a small test dataset shows that the new approach achieves an F-score of 98.2% on word segmentation, and 94.8% on both word segmentation and POS tagging. This work has already been applied into the Japanese-Chinese machine translation system successfully.