计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
11期
1345-1357
,共13页
夏家莉%程春雷%陈辉%曹重华%李光泉
夏傢莉%程春雷%陳輝%曹重華%李光泉
하가리%정춘뢰%진휘%조중화%리광천
实体关系%谓词概念模型(PCM)%概念相关度%概念连通度
實體關繫%謂詞概唸模型(PCM)%概唸相關度%概唸連通度
실체관계%위사개념모형(PCM)%개념상관도%개념련통도
entities relationship%predicate concept model (PCM)%concept association degree%concept connectivity
中文实体关系抽取是开放域文本检索与知识发现的研究热点,传统的抽取策略普遍存在人工标注量大,模式通用性受限,关系抽取粒度相对固定等问题,限制了其在开放领域的关系抽取效果。基于概念的结构分层和关系连通,面向中文实体关系构建了谓词概念模型(predicate concept model,PCM),在此基础上,提出了增量学习的谓词概念获取策略PCIA和基于谓词概念连通的关系抽取策略PCCS,由此进行了开放域非紧密的、远距离实体关系的抽取。各谓词概念的构建相对独立,概念组合更为灵活,对关系的描述具有更好的通用性和可解释性,为开放域未知关系的识别与抽取提供了有效手段。实验结果表明,PCCS有效提升了中文实体识别及实体连通路径选择的质量,获得了良好的关系抽取性能。
中文實體關繫抽取是開放域文本檢索與知識髮現的研究熱點,傳統的抽取策略普遍存在人工標註量大,模式通用性受限,關繫抽取粒度相對固定等問題,限製瞭其在開放領域的關繫抽取效果。基于概唸的結構分層和關繫連通,麵嚮中文實體關繫構建瞭謂詞概唸模型(predicate concept model,PCM),在此基礎上,提齣瞭增量學習的謂詞概唸穫取策略PCIA和基于謂詞概唸連通的關繫抽取策略PCCS,由此進行瞭開放域非緊密的、遠距離實體關繫的抽取。各謂詞概唸的構建相對獨立,概唸組閤更為靈活,對關繫的描述具有更好的通用性和可解釋性,為開放域未知關繫的識彆與抽取提供瞭有效手段。實驗結果錶明,PCCS有效提升瞭中文實體識彆及實體連通路徑選擇的質量,穫得瞭良好的關繫抽取性能。
중문실체관계추취시개방역문본검색여지식발현적연구열점,전통적추취책략보편존재인공표주량대,모식통용성수한,관계추취립도상대고정등문제,한제료기재개방영역적관계추취효과。기우개념적결구분층화관계련통,면향중문실체관계구건료위사개념모형(predicate concept model,PCM),재차기출상,제출료증량학습적위사개념획취책략PCIA화기우위사개념련통적관계추취책략PCCS,유차진행료개방역비긴밀적、원거리실체관계적추취。각위사개념적구건상대독립,개념조합경위령활,대관계적묘술구유경호적통용성화가해석성,위개방역미지관계적식별여추취제공료유효수단。실험결과표명,PCCS유효제승료중문실체식별급실체련통로경선택적질량,획득료량호적관계추취성능。
Chinese entities relation extraction task is a research focus of text retrieval and knowledge discovery in the open corpus. In the traditional extraction strategies, there exist some problems such as heavy workload of manual annotating, poor pattern versatility and relatively fixed relational granularity, etc. All these restrict the extraction effect in open corpus especially. This paper builds the predicate concept model (PCM) relying on hierarchical structure and relational connectivity of concept, proposes the predicate concept acquisition strategy for incremental concept learning (PCIA), achieves the extraction strategy based on predicate concept connectivity (PCCS), and carries out the untight, long-distant relation extraction ultimately. The construction of the formal concepts is relatively independent, and the combination of concept granularities is more flexible. Therefore, the description approach of the relationship has a better versatility and interpretability, and provides an effective means for unknown relationship identifying and extracting in the open corpus. The experimental results show that PCCS improves the effect of entities identification and entities connectivity path choice, and obtains good entities relation extracting performance.