软件学报
軟件學報
연건학보
JOURNAL OF SOFTWARE
2014年
9期
2088-2101
,共14页
欧阳丹彤%瞿剑峰%叶育鑫
歐暘丹彤%瞿劍峰%葉育鑫
구양단동%구검봉%협육흠
远监督%关系抽取%本体
遠鑑督%關繫抽取%本體
원감독%관계추취%본체
distant supervision%relation extraction%ontology
远监督学习是适合大数据下关系抽取任务的一种学习算法。它通过对齐知识库中的关系实例和文本集中的自然语句,为学习算法提供大规模样本数据。利用本体进行关系实例的自动扩充,用于解决基于远监督学习的关系抽取任务中部分待抽取关系的实例匮乏问题。该方法首先通过定义关系覆盖率和公理容积率,来寻找与关系抽取任务关联性大的本体;然后,借助本体推理中的实例查询增加待抽取关系下的关系实例;最后,通过对齐新增关系实例和文本集中的自然语句,达到扩充样本的效果。实验结果表明:基于本体的远监督学习样本扩充方法能够有效完成样本匮乏的关系抽取任务,进一步提升远监督学习方法在大数据环境下的关系抽取能力。
遠鑑督學習是適閤大數據下關繫抽取任務的一種學習算法。它通過對齊知識庫中的關繫實例和文本集中的自然語句,為學習算法提供大規模樣本數據。利用本體進行關繫實例的自動擴充,用于解決基于遠鑑督學習的關繫抽取任務中部分待抽取關繫的實例匱乏問題。該方法首先通過定義關繫覆蓋率和公理容積率,來尋找與關繫抽取任務關聯性大的本體;然後,藉助本體推理中的實例查詢增加待抽取關繫下的關繫實例;最後,通過對齊新增關繫實例和文本集中的自然語句,達到擴充樣本的效果。實驗結果錶明:基于本體的遠鑑督學習樣本擴充方法能夠有效完成樣本匱乏的關繫抽取任務,進一步提升遠鑑督學習方法在大數據環境下的關繫抽取能力。
원감독학습시괄합대수거하관계추취임무적일충학습산법。타통과대제지식고중적관계실례화문본집중적자연어구,위학습산법제공대규모양본수거。이용본체진행관계실례적자동확충,용우해결기우원감독학습적관계추취임무중부분대추취관계적실례궤핍문제。해방법수선통과정의관계복개솔화공리용적솔,래심조여관계추취임무관련성대적본체;연후,차조본체추리중적실례사순증가대추취관계하적관계실례;최후,통과대제신증관계실례화문본집중적자연어구,체도확충양본적효과。실험결과표명:기우본체적원감독학습양본확충방법능구유효완성양본궤핍적관계추취임무,진일보제승원감독학습방법재대수거배경하적관계추취능력。
Distant supervision is a suitable method for relation extraction in big data. It provides a large amount of sample data by aligning relation instances in knowledge base with nature sentences in corpus. In this paper, a new method of distant supervision with expansion of ontology-based sampling is investigated to address the difficulty of extracting relations from sparse training data. First, an ontology which has a deep link with relation extraction is sought through the definition of cover ratio and volume ratio. Second, some relation instances are added by ontology reasoning and examples of queries. Finally, the expansion of training sets is completed by aligning the new relation instances and nature sentences in corpus. The experiment shows that the presented method is capable of extracting some relations whose training sets are weak, a task impossible by the normal distant supervision method.