计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2010年
1期
280-282
,共3页
王锋%王伟%张璟%罗作民
王鋒%王偉%張璟%囉作民
왕봉%왕위%장경%라작민
网络爬虫%URL调度%DNS解析%哈希算法
網絡爬蟲%URL調度%DNS解析%哈希算法
망락파충%URL조도%DNS해석%합희산법
Web crawler%URL dispatch%DNS resolution%Hash algorithm
针对目前影响爬虫程序效率的诸多关键因素,在研究爬虫程序内部运行机理的基础上,进行架构优化,改进爬虫程序中的相关算法.在Linux网络环境下,通过对实现的爬虫程序运行进行检测,反馈出该解决方案和改进之处具有可行性,提高了页面抓取的效率和爬虫程序的整体性能.
針對目前影響爬蟲程序效率的諸多關鍵因素,在研究爬蟲程序內部運行機理的基礎上,進行架構優化,改進爬蟲程序中的相關算法.在Linux網絡環境下,通過對實現的爬蟲程序運行進行檢測,反饋齣該解決方案和改進之處具有可行性,提高瞭頁麵抓取的效率和爬蟲程序的整體性能.
침대목전영향파충정서효솔적제다관건인소,재연구파충정서내부운행궤리적기출상,진행가구우화,개진파충정서중적상관산법.재Linux망락배경하,통과대실현적파충정서운행진행검측,반궤출해해결방안화개진지처구유가행성,제고료혈면조취적효솔화파충정서적정체성능.
In view of current key aspects that affect the crawler system efficiency, through research of crawler system interior movement mechanism, this paper optimizes the overhead construction and improves its algorithm. In the Linux network environment, through movement examination of the crawler system, it may feed back several kinds of solutions and improvement place which are feasible, and it also enhances the efficiency and the crawler system overall performance.