计算机研究与发展
計算機研究與髮展
계산궤연구여발전
JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT
2010年
2期
255-263
,共9页
语义网%搜索引擎%语义网文档搜索%RDF句子%片段提取
語義網%搜索引擎%語義網文檔搜索%RDF句子%片段提取
어의망%수색인경%어의망문당수색%RDF구자%편단제취
semantic Web%search engine%RDF document search%RDF sentence%snippet generation
语义网文档搜索是发现语义网数据的重要手段.针对传统信息检索方法的不足,提出基于RDF句子的文档词向量构建方法.首先,文档被看作RDF句子的集合,从而在文档分析和索引时能够保留基于RDF句子的结构信息.其次,引入资源的权威描述的定义,能够跨越文档边界搜索到语义网中互连的数据. 此外,扩展了传统的倒排索引结构, 使得系统能够提取出更加便于阅读和理解的片段.在大规模真实数据集上的实验表明,该方法可以显著地提高文档检索的效率,在可用性上具有明显的提升.
語義網文檔搜索是髮現語義網數據的重要手段.針對傳統信息檢索方法的不足,提齣基于RDF句子的文檔詞嚮量構建方法.首先,文檔被看作RDF句子的集閤,從而在文檔分析和索引時能夠保留基于RDF句子的結構信息.其次,引入資源的權威描述的定義,能夠跨越文檔邊界搜索到語義網中互連的數據. 此外,擴展瞭傳統的倒排索引結構, 使得繫統能夠提取齣更加便于閱讀和理解的片段.在大規模真實數據集上的實驗錶明,該方法可以顯著地提高文檔檢索的效率,在可用性上具有明顯的提升.
어의망문당수색시발현어의망수거적중요수단.침대전통신식검색방법적불족,제출기우RDF구자적문당사향량구건방법.수선,문당피간작RDF구자적집합,종이재문당분석화색인시능구보류기우RDF구자적결구신식.기차,인입자원적권위묘술적정의,능구과월문당변계수색도어의망중호련적수거. 차외,확전료전통적도배색인결구, 사득계통능구제취출경가편우열독화리해적편단.재대규모진실수거집상적실험표명,해방법가이현저지제고문당검색적효솔,재가용성상구유명현적제승.
Keyword-based semantic Web document search is one of the most efficient approaches to find semantic Web data. Most existing approaches are based on traditional IR technologies, in which documents are modeled as bag of words. The authors identify the difficulties of these technologies in processing RDF documents, namely, preserving data structures, processing linked data and generating snippets. An approach is proposed to model the semantic Web document from its abstract syntax: RDF graph. In this approach, a document is modeled as a set of RDF sentences. It preserves the RDF sentence-based structures in the processes of document analyzing and indexing. The authoritative descriptions of named resources are also introduced and it enables the linked data across document boundaries to be searchable. Furthermore, to help users quickly determine whether one result is relevant or not, The traditional inverse index structure is extended to enable more understandable snippet extraction from matched documents. Experiments on real world data show that this approach can significantly improve the precision and recall of semantic Web document search. The precision at top one result is improved up to 19% and a steady improvement (near 10%) is observed. According to 50 random queries, the recall increases up to 60% averagely. Remarkable improvements in system usability are also obtained.