北京师范大学学报:社会科学版
北京師範大學學報:社會科學版
북경사범대학학보:사회과학판
Journal of Beijing Normal University(Social Science Edition)
2010年
5期
53~58
,共null页
句本位 中心词分析法 树库 自动句法分析
句本位 中心詞分析法 樹庫 自動句法分析
구본위 중심사분석법 수고 자동구법분석
sentence-based syntax; head-driven sentence analyzing method; Treebank; automatic parsing
树库是一种带句法标注的语料库,它记录着真实文本中每个句子的句法分析结果——句法树。上世纪90年代,自然语言的自动句法分析再次成为国际计算语言学界关注的焦点,一个重要原因是美国宾州树库PTB的建成。根据树库自动归纳出来的概率型上下文无关语法,使英语的句法分析器在性能上显著超越了先前基于规则和合一运算的句法分析器。世界上为各种自然语言构建的树库,不论是短语结构树库还是依存结构树库,都以句子为基本的描述单位。依存语法是一种词例化语法,它不采用短语结构的语法概念,而直接描写句子中词与词之间的依存关系,即认为句子中任何两个具有依存关系的词中必有一个是中心词(支配词),而另一个是被支配词。因此,依存语法直接体现了一种语言的句法层面和语义层面之间的天然联系。这充分说明,黎锦熙先生在《新著国语文法》中倡导的句本位语法体系和中心词分析法具有鲜活的生命力。它们不仅在我国解放前后的中学语文教学中数十年长盛不衰,而且至今仍在指导着树库的建设和应用。
樹庫是一種帶句法標註的語料庫,它記錄著真實文本中每箇句子的句法分析結果——句法樹。上世紀90年代,自然語言的自動句法分析再次成為國際計算語言學界關註的焦點,一箇重要原因是美國賓州樹庫PTB的建成。根據樹庫自動歸納齣來的概率型上下文無關語法,使英語的句法分析器在性能上顯著超越瞭先前基于規則和閤一運算的句法分析器。世界上為各種自然語言構建的樹庫,不論是短語結構樹庫還是依存結構樹庫,都以句子為基本的描述單位。依存語法是一種詞例化語法,它不採用短語結構的語法概唸,而直接描寫句子中詞與詞之間的依存關繫,即認為句子中任何兩箇具有依存關繫的詞中必有一箇是中心詞(支配詞),而另一箇是被支配詞。因此,依存語法直接體現瞭一種語言的句法層麵和語義層麵之間的天然聯繫。這充分說明,黎錦熙先生在《新著國語文法》中倡導的句本位語法體繫和中心詞分析法具有鮮活的生命力。它們不僅在我國解放前後的中學語文教學中數十年長盛不衰,而且至今仍在指導著樹庫的建設和應用。
수고시일충대구법표주적어료고,타기록착진실문본중매개구자적구법분석결과——구법수。상세기90년대,자연어언적자동구법분석재차성위국제계산어언학계관주적초점,일개중요원인시미국빈주수고PTB적건성。근거수고자동귀납출래적개솔형상하문무관어법,사영어적구법분석기재성능상현저초월료선전기우규칙화합일운산적구법분석기。세계상위각충자연어언구건적수고,불론시단어결구수고환시의존결구수고,도이구자위기본적묘술단위。의존어법시일충사례화어법,타불채용단어결구적어법개념,이직접묘사구자중사여사지간적의존관계,즉인위구자중임하량개구유의존관계적사중필유일개시중심사(지배사),이령일개시피지배사。인차,의존어법직접체현료일충어언적구법층면화어의층면지간적천연련계。저충분설명,려금희선생재《신저국어문법》중창도적구본위어법체계화중심사분석법구유선활적생명력。타문불부재아국해방전후적중학어문교학중수십년장성불쇠,이차지금잉재지도착수고적건설화응용。
Treebank is a text corpus with syntactic annotation. It records the syntactic tree, i.e. the syntactic parse, of every sentence in running texts. Since 1990s, automatic parsing of natural languages has again become the focus of the international community of computational linguistics, and one of the crucial reasons is the realization of the Penn Treebank (PTB). The performances of statistical parsers, which are based on automatically induced Probabilistic Context-Free Grammar (PCFG), outperform significantly those rule-and unification-based parsers. A Treebank of any language in the world, represented with either phrase structures or dependency structures, takes sentence as its basic description unit. Dependency Grammar is a lexicalized grammar; it denies the notion of phrase structures and describes only the various word-word relations in a sentence, in which the head-word is the dominant of a given relation, and the other word of the word-pair at stake is the dependent of the head. Dependency Grammar creates a transparent interface between the dependency syntax and semantics of a language. This paper highly estimates the life force of the sentence-based syntax and the head-driven sentence analyzing method advocated by Jinxi Li, because they have not only dominated grammar teaching in middle schools more than half century before and after the foundation of the People’s Republic of China, but also guides the treebanking practice today.