计算机研究与发展
計算機研究與髮展
계산궤연구여발전
JOURNAL OF COMPUTER RESEARCH AND DEVELOPMENT
2009年
z2期
748-755
,共8页
向永清%邓志鸿%于航%高宁
嚮永清%鄧誌鴻%于航%高寧
향영청%산지홍%우항%고저
XML%二级索引%关键词%检索%SLCA%栈算法
XML%二級索引%關鍵詞%檢索%SLCA%棧算法
XML%이급색인%관건사%검색%SLCA%잔산법
XML%two-layer index%keyword retrieval%SLCA%stack algorithm
随着互联网上XML文档的大量增加,如何高效地索引、存储和检索这些XML数据成为一个非常值得深入研究的课题.目前,在XML关键词检索方面,主流的检索系统都是建立在一级索引的基础上.一级索引存在两个明显的缺点:1)索引的冗余度比较高;2)索引的可扩展性和灵活性较差.通过结合传统倒排索引和基于杜威编码的XML节点索引的优点,提出面向XML文档的二级索引模型,并把该模型应用于求解XML关键词检索中的SLCA,实现了基于二级索引的求解SLCA的栈算法.实验表明,二级索引模型能够节省约30%的空间开销,在时间效率方面,基于二级索引的栈算法在效率上比基于一级索引的栈算法要高1个数量级左右,并且随着关键词数目的增加,这种效率优势会越加明显.
隨著互聯網上XML文檔的大量增加,如何高效地索引、存儲和檢索這些XML數據成為一箇非常值得深入研究的課題.目前,在XML關鍵詞檢索方麵,主流的檢索繫統都是建立在一級索引的基礎上.一級索引存在兩箇明顯的缺點:1)索引的冗餘度比較高;2)索引的可擴展性和靈活性較差.通過結閤傳統倒排索引和基于杜威編碼的XML節點索引的優點,提齣麵嚮XML文檔的二級索引模型,併把該模型應用于求解XML關鍵詞檢索中的SLCA,實現瞭基于二級索引的求解SLCA的棧算法.實驗錶明,二級索引模型能夠節省約30%的空間開銷,在時間效率方麵,基于二級索引的棧算法在效率上比基于一級索引的棧算法要高1箇數量級左右,併且隨著關鍵詞數目的增加,這種效率優勢會越加明顯.
수착호련망상XML문당적대량증가,여하고효지색인、존저화검색저사XML수거성위일개비상치득심입연구적과제.목전,재XML관건사검색방면,주류적검색계통도시건립재일급색인적기출상.일급색인존재량개명현적결점:1)색인적용여도비교고;2)색인적가확전성화령활성교차.통과결합전통도배색인화기우두위편마적XML절점색인적우점,제출면향XML문당적이급색인모형,병파해모형응용우구해XML관건사검색중적SLCA,실현료기우이급색인적구해SLCA적잔산법.실험표명,이급색인모형능구절성약30%적공간개소,재시간효솔방면,기우이급색인적잔산법재효솔상비기우일급색인적잔산법요고1개수량급좌우,병차수착관건사수목적증가,저충효솔우세회월가명현.
With the rapid increasing of XML documents on the Web,how to index,store and retrieve these documents has become a very popular and valuable problem.At present,there are two normal ways of retrieving XML documents.One is structure-based retrievaI,such as XPath and XQuery;the other is keyword-based retrieval.In the aspect of keyword-based XML retrieval,a majority of systems and algorithms are built based on one-layer index.However,one-layer index has two disadvantages:firstly,it may cause redundancy;secondly,it is not easy to be updated.In this paper,a new XML index model called two-layer index model is proposed,which considers both the advantages of traditional inverted index and dewey-code based inverted index.Moreover,a new stack algorithm based on two-layer index is proposed in order to rapidly get SLCA results from XML document sets.At last,the results of building two-layer index on Wiki document set and applying the stack algorithm to get SLCA results from two-layer index are presented,which show the efficiency of the proposed index model and algorithm.