中文信息学报
中文信息學報
중문신식학보
JOURNAL OF CHINESE INFORMAITON PROCESSING
2010年
1期
54-59
,共6页
计算机应用%中文信息处理%Web%概念实例提取%属性提取%弱指导%上下文模式
計算機應用%中文信息處理%Web%概唸實例提取%屬性提取%弱指導%上下文模式
계산궤응용%중문신식처리%Web%개념실례제취%속성제취%약지도%상하문모식
computer application%Chinese information processing%web%domain concept instance extraction%attributes extraction%weakly-supervised%contextual pattern
该文提出了一种基于Web弱指导的本体概念实例和属性的同步提取方法,利用小规模的种子实例和属性集,该文从Web上自动获取实例和属性共现的上下文模式,并利用种子实例和属性的关联性来评价这些模式.进一步,根据上下文模式提取候选概念实例和属性后,该文提出两种方法来评价提取的候选实例和属性.第一,利用概念实例和属性的关联性来互相评价对方的准确度;第二,利用候选实例或候选属性与种子实例或属性在上下文模式分布上的相似度来评价准确度.在疾病类实验结果表明,人工确认候选实例的准确率在前500个结果达到94%,前1000个结果的准确率也高达93%.
該文提齣瞭一種基于Web弱指導的本體概唸實例和屬性的同步提取方法,利用小規模的種子實例和屬性集,該文從Web上自動穫取實例和屬性共現的上下文模式,併利用種子實例和屬性的關聯性來評價這些模式.進一步,根據上下文模式提取候選概唸實例和屬性後,該文提齣兩種方法來評價提取的候選實例和屬性.第一,利用概唸實例和屬性的關聯性來互相評價對方的準確度;第二,利用候選實例或候選屬性與種子實例或屬性在上下文模式分佈上的相似度來評價準確度.在疾病類實驗結果錶明,人工確認候選實例的準確率在前500箇結果達到94%,前1000箇結果的準確率也高達93%.
해문제출료일충기우Web약지도적본체개념실례화속성적동보제취방법,이용소규모적충자실례화속성집,해문종Web상자동획취실례화속성공현적상하문모식,병이용충자실례화속성적관련성래평개저사모식.진일보,근거상하문모식제취후선개념실례화속성후,해문제출량충방법래평개제취적후선실례화속성.제일,이용개념실례화속성적관련성래호상평개대방적준학도;제이,이용후선실례혹후선속성여충자실례혹속성재상하문모식분포상적상사도래평개준학도.재질병류실험결과표명,인공학인후선실례적준학솔재전500개결과체도94%,전1000개결과적준학솔야고체93%.
In this paper, we propose a weakly-supervised method of extracting Ontology concept instances and attributes from the Web. We automatically acquire the co-occurrence patterns of the concept instances and attributes from the Web, and we evaluate these patterns based on the assumption that concept instances are relevant to their attributes. Furthermore, we extract the candidate concept instances and attributes. This paper proposes two ways to evaluate the accuracy of the candidate instances and attributes: the first measure is based on the correlation between concept instances and attributes, and the second one is based on the distribution similarity on the context patterns between the candidate instances (or attributes) and the seed instances (or attributes). Experiments on disease domain show that the precision of the top 500 and 1000 results reaches 94% and 93%, respectively.