计算机工程与设计
計算機工程與設計
계산궤공정여설계
COMPUTER ENGINEERING AND DESIGN
2015年
4期
1094-1097,1121
,共5页
马志强%张泽广%李昊盨%刘利民
馬誌彊%張澤廣%李昊盨%劉利民
마지강%장택엄%리호서%류이민
分布式采集系统%单节点服务器架构%多节点服务器架构%主题采集%准确率
分佈式採集繫統%單節點服務器架構%多節點服務器架構%主題採集%準確率
분포식채집계통%단절점복무기가구%다절점복무기가구%주제채집%준학솔
distributed web crawler%single-node server architecture%multi-node server architecture%focused web crawler%ac-curacy rate
针对主题信息采集系统存在的采集效率低和可扩展性差等问题,研究分布式采集系统架构,设计一种基于局域网的多节点服务器架构主题采集系统,解决了客户/服务器模式在采集网页数据增多时服务器控制端性能下降的问题,弥补了自治模式下网络间传递信息量增加导致数据延迟的缺陷。通过分别构建单节点服务器架构系统和基于局域网的多节点服务器架构系统进行实验,分析采集速度、平均采集速度和准确率等评价指标,分析结果表明,多节点服务器架构性能明显优于单节点服务器架构。
針對主題信息採集繫統存在的採集效率低和可擴展性差等問題,研究分佈式採集繫統架構,設計一種基于跼域網的多節點服務器架構主題採集繫統,解決瞭客戶/服務器模式在採集網頁數據增多時服務器控製耑性能下降的問題,瀰補瞭自治模式下網絡間傳遞信息量增加導緻數據延遲的缺陷。通過分彆構建單節點服務器架構繫統和基于跼域網的多節點服務器架構繫統進行實驗,分析採集速度、平均採集速度和準確率等評價指標,分析結果錶明,多節點服務器架構性能明顯優于單節點服務器架構。
침대주제신식채집계통존재적채집효솔저화가확전성차등문제,연구분포식채집계통가구,설계일충기우국역망적다절점복무기가구주제채집계통,해결료객호/복무기모식재채집망혈수거증다시복무기공제단성능하강적문제,미보료자치모식하망락간전체신식량증가도치수거연지적결함。통과분별구건단절점복무기가구계통화기우국역망적다절점복무기가구계통진행실험,분석채집속도、평균채집속도화준학솔등평개지표,분석결과표명,다절점복무기가구성능명현우우단절점복무기가구。
To address the problems of low collecting efficiency,poor scalability and other issues of the focused web crawler,the research of focused web crawler for distributed architecture was carried out.The focused web crawler of multi-node server archi-tecture based on distributed architecture was designed.Performance degradation problems of web server control terminal were solved when data collection increased in the client/server mode,and defects of data delay caused by the increasing the amount of transmission networks information in autonomous mode were overcome.Through constructing the single-node server architecture system and multi-node server architecture system,the evaluation including collecting speed,average speed and accuracy rate were compared.Results of the simulation indicate that the performance of multi-node server architecture is obviously superior to single-node server architecture.