西北师大学报(社会科学版)
西北師大學報(社會科學版)
서북사대학보(사회과학판)
JOURNAL OF NORTHWEST NORMAL UNIVERSITY (SOCIAL SCIENCES)
2014年
6期
135-140
,共6页
公文%缩略语%二元相关性%识别
公文%縮略語%二元相關性%識彆
공문%축략어%이원상관성%식별
documents%abbreviation%bivariate correlations%identification
运用相关性理论,建立1200多万字的当代汉语政教类公文抽样语料库,在对语料库进行分词、标注等加工的基础上,对其中词语的二元相关性组合进行了抽样统计分析,以此为基础对双音节缩略语进行识别和抽取,获得了比较理想的结果,为缩略语的自动识别和公文自动理解提供了新的思路和方法。
運用相關性理論,建立1200多萬字的噹代漢語政教類公文抽樣語料庫,在對語料庫進行分詞、標註等加工的基礎上,對其中詞語的二元相關性組閤進行瞭抽樣統計分析,以此為基礎對雙音節縮略語進行識彆和抽取,穫得瞭比較理想的結果,為縮略語的自動識彆和公文自動理解提供瞭新的思路和方法。
운용상관성이론,건립1200다만자적당대한어정교류공문추양어료고,재대어료고진행분사、표주등가공적기출상,대기중사어적이원상관성조합진행료추양통계분석,이차위기출대쌍음절축략어진행식별화추취,획득료비교이상적결과,위축략어적자동식별화공문자동리해제공료신적사로화방법。
According to the correlation theory , this paper establishes a contemporary Chinese political and educational documents corpus which has more than 12 million words . The data are analyzed based on the corpus segmentation , tagging and processing . The results of identifying and extracting double syllable abbreviations based on the statistics and analysis are satisfactory , w hich provides more ideas and methods for the automatic identification of abbreviations and understanding of the official documents .