计算机科学
計算機科學
계산궤과학
COMPUTER SCIENCE
2010年
3期
230-233
,共4页
汉英动词次范畴化%统计分析%支持向量机
漢英動詞次範疇化%統計分析%支持嚮量機
한영동사차범주화%통계분석%지지향량궤
Chinese-English verb subcategorization%Statistical analysis%SVM
基于大规模句子级,对齐双语语料库进行了统计分析汉英动词次范畴化对应类型的系统性实验.首先以语言学量度为启发,应用双重最大似然检验的统计过滤方法初步估计了654种汉英次范畴化对应类型的概率分布;然后根据汉英句法特点对次范畴化对应类型进行了语言学分类;最后针对每一种对应类型及其背景语料进行了基于支持向量机的语言学类别标注和统计可靠性分析.
基于大規模句子級,對齊雙語語料庫進行瞭統計分析漢英動詞次範疇化對應類型的繫統性實驗.首先以語言學量度為啟髮,應用雙重最大似然檢驗的統計過濾方法初步估計瞭654種漢英次範疇化對應類型的概率分佈;然後根據漢英句法特點對次範疇化對應類型進行瞭語言學分類;最後針對每一種對應類型及其揹景語料進行瞭基于支持嚮量機的語言學類彆標註和統計可靠性分析.
기우대규모구자급,대제쌍어어료고진행료통계분석한영동사차범주화대응류형적계통성실험.수선이어언학량도위계발,응용쌍중최대사연검험적통계과려방법초보고계료654충한영차범주화대응류형적개솔분포;연후근거한영구법특점대차범주화대응류형진행료어언학분류;최후침대매일충대응류형급기배경어료진행료기우지지향량궤적어언학유별표주화통계가고성분석.
Based on large scale Chinese-English parallel corpus, this paper described a systematic experiment of statistical analysis for bilingual verb subcategorization. Firstly, with lexical and grammatical compatibility as heuristics, probabilistic distributions of 654 bilingual subcategorization frames were estimated by means of a two-fold MLE filtering method. Then, linguistic classification of the frames was determined according to Chinese and English syntax. Finally, linguistic classes for each frame were labeled via SVM on the basis of their supporting corpus.