CAJ | 학술논문

随着语义网的不断发展，发布在互联网上的资源描述框架( RDF)数据达到百亿级三元组规模，并且呈现几何增长趋势，针对RDF数据的单机SPARQL查询方法已经不再适用。为此，提出一种基于整体同步并行( BSP)模型的SPARQL基本图模式查询算法。根据RDF有向图数据特性及基本图模式定义，将整个查询过程分成匹配和迭代2个阶段，在匹配出所需查询的三元组模式后，通过迭代使部分解逐步逼近完全解，得到最终查询结果。利用HAMA分布式计算框架进行算法实现，实验结果表明，与基于MapReduce的SPARQL查询算法相比，该算法具有较高的查询效率，能为大规模RDF数据的快速SPARQL查询提供支持。
수착어의망적불단발전，발포재호련망상적자원묘술광가( RDF)수거체도백억급삼원조규모，병차정현궤하증장추세，침대RDF수거적단궤SPARQL사순방법이경불재괄용。위차，제출일충기우정체동보병행( BSP)모형적SPARQL기본도모식사순산법。근거RDF유향도수거특성급기본도모식정의，장정개사순과정분성필배화질대2개계단，재필배출소수사순적삼원조모식후，통과질대사부분해축보핍근완전해，득도최종사순결과。이용HAMA분포식계산광가진행산법실현，실험결과표명，여기우MapReduce적SPARQL사순산법상비，해산법구유교고적사순효솔，능위대규모RDF수거적쾌속SPARQL사순제공지지。
With the advance of semantic Web,the Resource Description Framework( RDF) data published on the Web reaches the scale of ten billion triples,and it shows a geometric growth trend. Simple Protocol and RDF Query Language ( SPARQL) query methods on stand-alone machine are no longer applicable. For this problem, this paper proposes a SPARQL Basic Graph Pattern(BGP) search algorithm based on Bulk Synchronous Parallel(BSP) model. According to the graph nature of RDF data and the definition of BGP, it divides the whole process into “matching” stage and“iteration” stage. First match each triple patterns and then iterate to get the query results eventually. It implements the algorithm by HAMA distributed computing framework. Experimental results show that it has higher query efficiency than SPARQL algorithm based on MapReduce,and it can support the SPARQL query of the large scale RDF data.