计算机应用研究
計算機應用研究
계산궤응용연구
APPLICATION RESEARCH OF COMPUTERS
2009年
8期
2952-2955
,共4页
林其东%陈传波%郑乐丹%张一曼
林其東%陳傳波%鄭樂丹%張一曼
림기동%진전파%정악단%장일만
数字图书馆%主题%爬行器%搜索引擎%EPR算法
數字圖書館%主題%爬行器%搜索引擎%EPR算法
수자도서관%주제%파행기%수색인경%EPR산법
digital library%topic-specific%crawler%search engines%algorithm-EPR
提出构建数字图书馆主题搜索引擎的总体系统设计.利用一个预处理系统尽量选择高质量的种子站点,从而产生Web主题定义数据;在系统控制器的协调下,各主题爬行器同步地采集爬行器所推荐的Web资源,对下载的资源进行文本分类与主题识别;将已经下载的Web资源按学科分类存储在Web主题资源库中,通过全局信息库建立索引,接入通用接口进行依主题检索.依赖数字图书馆各方面特点,提出支持多线程主题爬行器的设计,并提出一种新颖的URL主题相关性剪切算法EPR,为实现数字图书馆主题搜索引擎原型提供重要的设计.基于开源Lucene平台进行系统扩展而形成最终系统,实验结果表明该工作是相当有效的,尤其是提出的相关性判别算法EPR,具有相当的创新性和实际应用价值.
提齣構建數字圖書館主題搜索引擎的總體繫統設計.利用一箇預處理繫統儘量選擇高質量的種子站點,從而產生Web主題定義數據;在繫統控製器的協調下,各主題爬行器同步地採集爬行器所推薦的Web資源,對下載的資源進行文本分類與主題識彆;將已經下載的Web資源按學科分類存儲在Web主題資源庫中,通過全跼信息庫建立索引,接入通用接口進行依主題檢索.依賴數字圖書館各方麵特點,提齣支持多線程主題爬行器的設計,併提齣一種新穎的URL主題相關性剪切算法EPR,為實現數字圖書館主題搜索引擎原型提供重要的設計.基于開源Lucene平檯進行繫統擴展而形成最終繫統,實驗結果錶明該工作是相噹有效的,尤其是提齣的相關性判彆算法EPR,具有相噹的創新性和實際應用價值.
제출구건수자도서관주제수색인경적총체계통설계.이용일개예처리계통진량선택고질량적충자참점,종이산생Web주제정의수거;재계통공제기적협조하,각주제파행기동보지채집파행기소추천적Web자원,대하재적자원진행문본분류여주제식별;장이경하재적Web자원안학과분류존저재Web주제자원고중,통과전국신식고건립색인,접입통용접구진행의주제검색.의뢰수자도서관각방면특점,제출지지다선정주제파행기적설계,병제출일충신영적URL주제상관성전절산법EPR,위실현수자도서관주제수색인경원형제공중요적설계.기우개원Lucene평태진행계통확전이형성최종계통,실험결과표명해공작시상당유효적,우기시제출적상관성판별산법EPR,구유상당적창신성화실제응용개치.
This paper advanced the total system design for topic-specific search engine of digital library.It made use of a pretreatment system to select the seed station with high quality, thus giving Web topic defined data. Every topic crawler collected synchronistically Web resource recommended by crawlers with regulation of system controller,then classified text and identified topic in download resource, which was stored into Web topic resource database according to discipline classification.Others could search the topic resource through the index of whole information database.According to every specially characterist of digital library,this paper brang up the design for topic-specific crawler of multi-thread, and gave anovel URL pruning algorithm-EPR,for the design to realize topic-specific search engine prototype of digital library. Lucene-based open-source platform for the expansion of the system and the formation of the final system,the experiment results show that the research work of this article is effective,especially in EPR algorithm, which are really creative and valuable in real application environment.