电子科技大学学报
電子科技大學學報
전자과기대학학보
JOURNAL OF UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA
2014年
5期
758-763
,共6页
属性抽取%特征提取%关系抽取%弱监督学习
屬性抽取%特徵提取%關繫抽取%弱鑑督學習
속성추취%특정제취%관계추취%약감독학습
attribute extraction%feature extraction%relation extraction%weakly supervised learning
提出基于弱监督学习的属性抽取方法,利用知识库中已有结构化的属性信息自动获取训练语料,有效解决了训练语料不足问题。针对训练语料存在的噪声问题,提出基于关键词过滤的训练语料优化方法。提出n元模式特征提取方法,该特征能够缓解传统n-gram特征稀疏性问题。实验数据源来自互动百科,从互动百科信息盒中抽取结构化属性信息构建知识库,从百科条目文本中自动获取训练数据和测试数据。实验结果表明,关键词过滤能有效提高训练语料的质量,与传统n-gram特征相比,n元模式特征能够提高属性抽取的性能。
提齣基于弱鑑督學習的屬性抽取方法,利用知識庫中已有結構化的屬性信息自動穫取訓練語料,有效解決瞭訓練語料不足問題。針對訓練語料存在的譟聲問題,提齣基于關鍵詞過濾的訓練語料優化方法。提齣n元模式特徵提取方法,該特徵能夠緩解傳統n-gram特徵稀疏性問題。實驗數據源來自互動百科,從互動百科信息盒中抽取結構化屬性信息構建知識庫,從百科條目文本中自動穫取訓練數據和測試數據。實驗結果錶明,關鍵詞過濾能有效提高訓練語料的質量,與傳統n-gram特徵相比,n元模式特徵能夠提高屬性抽取的性能。
제출기우약감독학습적속성추취방법,이용지식고중이유결구화적속성신식자동획취훈련어료,유효해결료훈련어료불족문제。침대훈련어료존재적조성문제,제출기우관건사과려적훈련어료우화방법。제출n원모식특정제취방법,해특정능구완해전통n-gram특정희소성문제。실험수거원래자호동백과,종호동백과신식합중추취결구화속성신식구건지식고,종백과조목문본중자동획취훈련수거화측시수거。실험결과표명,관건사과려능유효제고훈련어료적질량,여전통n-gram특정상비,n원모식특정능구제고속성추취적성능。
An attribute extraction method based on weakly supervised learning is proposed in the paper. The training corpus is automatically acquired from natural language texts by using structured attribute information from knowledgebase. To solve the problem that noise exists in the training corpus, an optimization method based on keywords filtering is proposed.N-pattern features extraction method is proposed which can relieve to some extent the data sparsity problem of traditionaln-gram features. Experiment data are downloaded from Hudong Baike. Structured attribute information is extracted from infoboxes of Hudong Baike and used to construct knowledgebase. Training data and testing data are acquired from encyclopedia entry texts. Experiment results show that the method of keywords filtering can effectively improve the quality of training corpus, and achieve better performance of attribute extraction by usingn-pattern features, compared with traditionaln-gram features.