计算机科学
計算機科學
계산궤과학
COMPUTER SCIENCE
2010年
3期
6-10,16
,共6页
张海军%史树敏%朱朝勇%黄河燕
張海軍%史樹敏%硃朝勇%黃河燕
장해군%사수민%주조용%황하연
新词识别%未登录词%候选字串%训练语料%词性猜测
新詞識彆%未登錄詞%候選字串%訓練語料%詞性猜測
신사식별%미등록사%후선자천%훈련어료%사성시측
New words Identification%Unknown words%Candidate string%Training corpus%POS guessing
新词识别是中文信息处理领域的关键技术.新词识别主要包括候选字串的提取过滤和词性猜测两项任务.中文没有特定符号标志词边界,因此任何相邻字符都有成词的可能性,这给新词提取过滤带来了很大困难;由于没有先验知识和统计数据,新词词性猜测一直是中文词性标注的技术瓶颈.详细分析了中文新词识别技术的研究现状,重点讨论了候选新词提取和词性猜测的研究方法与存在的主要问题,最后对新词识别研究方向进行了展望.
新詞識彆是中文信息處理領域的關鍵技術.新詞識彆主要包括候選字串的提取過濾和詞性猜測兩項任務.中文沒有特定符號標誌詞邊界,因此任何相鄰字符都有成詞的可能性,這給新詞提取過濾帶來瞭很大睏難;由于沒有先驗知識和統計數據,新詞詞性猜測一直是中文詞性標註的技術瓶頸.詳細分析瞭中文新詞識彆技術的研究現狀,重點討論瞭候選新詞提取和詞性猜測的研究方法與存在的主要問題,最後對新詞識彆研究方嚮進行瞭展望.
신사식별시중문신식처리영역적관건기술.신사식별주요포괄후선자천적제취과려화사성시측량항임무.중문몰유특정부호표지사변계,인차임하상린자부도유성사적가능성,저급신사제취과려대래료흔대곤난;유우몰유선험지식화통계수거,신사사성시측일직시중문사성표주적기술병경.상세분석료중문신사식별기술적연구현상,중점토론료후선신사제취화사성시측적연구방법여존재적주요문제,최후대신사식별연구방향진행료전망.
New Words Identification (NWI)is a key technology in the field of Chinese information processing. NWI mainly includes two tasks: one is new words candidate extracting and filtering, the other is new words POS guessing. Since there is no specific symbol to mark word boundary for Chinese words, any adjacent characters are possible to compose a word, which brings a lot of obstacles for NWI. Moreover, because the prior knowledge and statistical data are not available, new words POS guessing has become the technological bottleneck of Chinese tagging. The status of the field for Chinese NWI was analyzed in detail, and the research techniques and existing problems for new words candidates extracting and new words POS guessing were discussed emphatically. In the end, the paper presented the prospects of the study for Chinese NWI.