电脑开发与应用
電腦開髮與應用
전뇌개발여응용
COMPUTER DEVELOPMENT & APPLICATIONS
2014年
9期
44-47
,共4页
主题抽取%领域本体%同/近义词%TF-IDF
主題抽取%領域本體%同/近義詞%TF-IDF
주제추취%영역본체%동/근의사%TF-IDF
theme extraction%domain ontology%synonym/near synonym%TF-IDF
为了使抽取的主题词更能反映领域文档的内容,提出一种基于本体的领域文档主题抽取方法。该方法利用领域文档的特点,使用领域本体对文档词汇集进行过滤,排除非领域高频词汇的干扰并降低文档词汇集维度,从而提高算法效率和抽取质量;利用同/近义词典对文档候选主题词及其权重进行合并,降低同/近义词对抽取结果的影响,使得结果更加全面准确。实验表明,该方法具有较高的正确率和召回率。
為瞭使抽取的主題詞更能反映領域文檔的內容,提齣一種基于本體的領域文檔主題抽取方法。該方法利用領域文檔的特點,使用領域本體對文檔詞彙集進行過濾,排除非領域高頻詞彙的榦擾併降低文檔詞彙集維度,從而提高算法效率和抽取質量;利用同/近義詞典對文檔候選主題詞及其權重進行閤併,降低同/近義詞對抽取結果的影響,使得結果更加全麵準確。實驗錶明,該方法具有較高的正確率和召迴率。
위료사추취적주제사경능반영영역문당적내용,제출일충기우본체적영역문당주제추취방법。해방법이용영역문당적특점,사용영역본체대문당사회집진행과려,배제비영역고빈사회적간우병강저문당사회집유도,종이제고산법효솔화추취질량;이용동/근의사전대문당후선주제사급기권중진행합병,강저동/근의사대추취결과적영향,사득결과경가전면준학。실험표명,해방법구유교고적정학솔화소회솔。
In order to reflect contents of the extracted keywords field of the document, this paper proposes a field of document extraction method based on ontology. The method uses the characteristics of the field of document, which uses domain ontology to filter documentation vocabulary, to exclude the interference of high frequency vocabularies not in the field, and reduce the dimension of documentation vocabulary, thus improving the algorithm efficiency and extraction quality. This method uses synonym/near synonym dictionary to candidate theme words and their weights, reduces the impact of synonym/near synonym on extraction results, making the results more comprehensive and accurate. Experiments show that the method has higher precision and recall rate.