计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2014年
12期
141-145
,共5页
户冰心%古丽拉·阿东别克%祁卉
戶冰心%古麗拉·阿東彆剋%祁卉
호빙심%고려랍·아동별극%기훼
哈萨克语%自然语言处理%歧义%附加成分%条件随机场模型%模板
哈薩剋語%自然語言處理%歧義%附加成分%條件隨機場模型%模闆
합살극어%자연어언처리%기의%부가성분%조건수궤장모형%모판
Kazakh%natural language processing%ambiguity%additional component%conditional random field model%template
通过研究大量包含歧义的短语实例,分析计算机处理过程中遇到的短语结构边界判定的歧义问题。针对“v+n+n”这种常见的歧义格式,采用条件随机场模型进行消歧。结合哈萨克语的语言特点,提出基于哈萨克语词尾的类别及位置信息来构建特征模板的方法。以新疆日报(哈语版)2008年30天的数据统计为实验语料,加入消歧策略后名词短语和动词短语的识别准确率分别达到87.23%和97.46%;召回率分别达到80.12%和95.80%。实验结果表明,将提取出的特征引入到条件随机场模型后,系统的准确率、召回率和F值均有所提高。
通過研究大量包含歧義的短語實例,分析計算機處理過程中遇到的短語結構邊界判定的歧義問題。針對“v+n+n”這種常見的歧義格式,採用條件隨機場模型進行消歧。結閤哈薩剋語的語言特點,提齣基于哈薩剋語詞尾的類彆及位置信息來構建特徵模闆的方法。以新疆日報(哈語版)2008年30天的數據統計為實驗語料,加入消歧策略後名詞短語和動詞短語的識彆準確率分彆達到87.23%和97.46%;召迴率分彆達到80.12%和95.80%。實驗結果錶明,將提取齣的特徵引入到條件隨機場模型後,繫統的準確率、召迴率和F值均有所提高。
통과연구대량포함기의적단어실례,분석계산궤처리과정중우도적단어결구변계판정적기의문제。침대“v+n+n”저충상견적기의격식,채용조건수궤장모형진행소기。결합합살극어적어언특점,제출기우합살극어사미적유별급위치신식래구건특정모판적방법。이신강일보(합어판)2008년30천적수거통계위실험어료,가입소기책략후명사단어화동사단어적식별준학솔분별체도87.23%화97.46%;소회솔분별체도80.12%화95.80%。실험결과표명,장제취출적특정인입도조건수궤장모형후,계통적준학솔、소회솔화F치균유소제고。
By studying a number of examples including ambiguity phrases,this paper analyzes the ambiguity problem of phrase structure boundary determination in the process of computer processing. Especially for the most common ambiguity format of “v+n +n”, it uses conditional random field model for disambiguation. Combined with the characteristics of Kazakh language,it puts forward a method that constructs the feature template based on category and location information of Kazakh suffix. Taking the Xinjiang Daily ( Kazakh Language Version ) for 30 days in 2008 statistical data as the experimental corpus,the recognition precision rate of noun phrase and verb phrase with the disambiguation strategy reaches 87. 23% and 97. 46%,and the recall rate reaches 80. 12%,95. 80%. Experimental results show that after introducing the feathers presented into conditional random field,accuracy rate,recall rate and F value of the system are improved.