地球信息科学学报
地毬信息科學學報
지구신식과학학보
GEO-INFORMATION SCIENCE
2014年
5期
681-690
,共10页
袁烨城%刘海江%裴韬%高锡章
袁燁城%劉海江%裴韜%高錫章
원엽성%류해강%배도%고석장
空间关系识别%自动机%空间词汇%依存关系%语义知识
空間關繫識彆%自動機%空間詞彙%依存關繫%語義知識
공간관계식별%자동궤%공간사회%의존관계%어의지식
spatial relation extraction%finite automata%spatial word%syntactic dependence%semantic knowledge
从自然语言文本(新闻报道、博客、论坛、社交网络等)中识别空间关系是大数据时代获取空间信息的重要手段之一。针对现有方法只考虑字词特征,识别过程容易产生匹配歧义的局限,本文提出了一种新的融入词法、句法等语义知识的空间关系识别方法。本方法设计了一个树形结构的抽取模式:树结点代表空间词汇类型,结点之间的关系代表词汇间的依存关系。其中,抽取模式可从标注语料中自主学习得到。模式匹配过程以空间词汇类型和句法依存关系作为硬性约束条件、以词汇语义相似度作为软性约束条件,将模式从树形结构转换成依存序列后,根据有限自动机原理实现匹配。实验结果表明,本方法的识别精度和召回率分别为86.67%和63.11%,与现有其他基于规则的方法相比,有2个优点:(1)模式学习过程无需人工干预;(2)融入了句法依存关系,可消除匹配歧义,提高了识别准确率。
從自然語言文本(新聞報道、博客、論罈、社交網絡等)中識彆空間關繫是大數據時代穫取空間信息的重要手段之一。針對現有方法隻攷慮字詞特徵,識彆過程容易產生匹配歧義的跼限,本文提齣瞭一種新的融入詞法、句法等語義知識的空間關繫識彆方法。本方法設計瞭一箇樹形結構的抽取模式:樹結點代錶空間詞彙類型,結點之間的關繫代錶詞彙間的依存關繫。其中,抽取模式可從標註語料中自主學習得到。模式匹配過程以空間詞彙類型和句法依存關繫作為硬性約束條件、以詞彙語義相似度作為軟性約束條件,將模式從樹形結構轉換成依存序列後,根據有限自動機原理實現匹配。實驗結果錶明,本方法的識彆精度和召迴率分彆為86.67%和63.11%,與現有其他基于規則的方法相比,有2箇優點:(1)模式學習過程無需人工榦預;(2)融入瞭句法依存關繫,可消除匹配歧義,提高瞭識彆準確率。
종자연어언문본(신문보도、박객、론단、사교망락등)중식별공간관계시대수거시대획취공간신식적중요수단지일。침대현유방법지고필자사특정,식별과정용역산생필배기의적국한,본문제출료일충신적융입사법、구법등어의지식적공간관계식별방법。본방법설계료일개수형결구적추취모식:수결점대표공간사회류형,결점지간적관계대표사회간적의존관계。기중,추취모식가종표주어료중자주학습득도。모식필배과정이공간사회류형화구법의존관계작위경성약속조건、이사회어의상사도작위연성약속조건,장모식종수형결구전환성의존서렬후,근거유한자동궤원리실현필배。실험결과표명,본방법적식별정도화소회솔분별위86.67%화63.11%,여현유기타기우규칙적방법상비,유2개우점:(1)모식학습과정무수인공간예;(2)융입료구법의존관계,가소제필배기의,제고료식별준학솔。
Extracting spatial relation from text documents in natural languages (news, journal, blog, social net-work etc.) is an important method of obtaining spatial information in the era of big data. Former methods of ex-tracting spatial relation from Chinese characterized text only focused on the features of Chinese characters and phrases, which easily cause ambiguous matching. This paper presented a new rule-based method that integrates lexical, syntactic and semantic knowledge. The extracting rule in this method was composed of spatial words and syntactic dependences between these words, which jointly formed a tree structure. The tree nodes represent the spatial words and they were connected by syntactic dependences. Spatial words were the words that can be used to express spatial relations, which were subsequently classified into 6 categories:geographical entities, preposi-tion, locative nouns, spatial predicate, metaphorical spatial nouns and assistant words. In the process of rule matching, finite automata was used to identify new spatial relation instances that satisfy the following two condi-tions: (1) same syntactic dependence structure with regard to the extracting rules; (2) similarity of the spatial words. The part-of-speech, semantic similarity were used to measure the consistency between spatial words. The experiment of extracting the direction relations from Encyclopedia of China shows that the accuracy and the re-call rate of this method achieve 86.67%and 63.11%respectively, which is better than the former methods. Com-paring with the former methods, the improvements of this method include:(1) the process of extracting rule gen-eration does not require human intervention;(2) the ambiguous matching can be diminished by integrating syn-tactic dependence knowledge, which evidently promoted the performance of spatial relation identification.