西藏大学学报(自然科学版)
西藏大學學報(自然科學版)
서장대학학보(자연과학판)
Journal of Tibet University
2015年
2期
96-104,110
,共10页
藏语自动分词%藏语数词%藏语数词结构
藏語自動分詞%藏語數詞%藏語數詞結構
장어자동분사%장어수사%장어수사결구
Tibetan automatic word segmentation%Tibetan numeral%Tibetan numeral structure
藏语自动分词技术是藏语自然语言处理的基础.文章通过分析藏语真实文本中的数词分类、数词词形以及数词结构等,提出了一种基于规则的识别方法.文章中的藏语数词识别思想为:在自动分词过程中,通过判断待切分的词(wi)和已切分的词(wi-1)来重新组合.经对小学一至六年级的数学藏文版教材及1500个含各类数词的句子语料进行测试后,数词的识别准确率达97.7%.
藏語自動分詞技術是藏語自然語言處理的基礎.文章通過分析藏語真實文本中的數詞分類、數詞詞形以及數詞結構等,提齣瞭一種基于規則的識彆方法.文章中的藏語數詞識彆思想為:在自動分詞過程中,通過判斷待切分的詞(wi)和已切分的詞(wi-1)來重新組閤.經對小學一至六年級的數學藏文版教材及1500箇含各類數詞的句子語料進行測試後,數詞的識彆準確率達97.7%.
장어자동분사기술시장어자연어언처리적기출.문장통과분석장어진실문본중적수사분류、수사사형이급수사결구등,제출료일충기우규칙적식별방법.문장중적장어수사식별사상위:재자동분사과정중,통과판단대절분적사(wi)화이절분적사(wi-1)래중신조합.경대소학일지륙년급적수학장문판교재급1500개함각류수사적구자어료진행측시후,수사적식별준학솔체97.7%.
Tibetan automatic word segmentation is the foundation in the processing of Tibetan natural language. An identification method based on the rule was proposed by analyzing the numeral classification, numeral word form and numeral structure in the Tibetan true texts. The baseline of this method is recombination by judging the word (wi) to be split and the segmented word (wi-1) in the segmentation process. The content of mathematical textbooks of primary school in Tibetan and 1500 sentences with various numerals were tested and the accuracy rate of numeral recognition reached up to 97.7%.